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This chapter is devoted to the theory of varieties, which provides an important tool, 
based in universal algebra, for the classification of regular languages. In the introductory 
section, we present a number of examples that illustrate and motivate the fundamental 
concepts. We do this for the most part without proofs, and often without precise defini¬ 
tions, leaving these to the formal development of the theory that begins in Section 2. Our 
presentation of the theory draws heavily on the very recent work of Gehrke, Grigorieff 
and Pin [22] on the equational theory of lattices of regular languages. In the subsequent 
sections we consider in more detail aspects of varieties that were only briefly evoked in 
the introduction: Decidability, operations on languages, and characterizations in formal 
logic. 


1 Motivation and examples 

We refer the readers to Chapter 1, and specifically to Sections 4.2 and 4.3 of that chapter, 
for the notion of a language recognized by a morphism into a finite monoid, and for the 
definition of the syntactic monoid Synt(L) of a language L. 


1.1 Idempotent and commutative monoids 

When one begins the study of abstract algebra, groups are usually encountered before 
semigroups and monoids. The simplest example of a monoid that is not a group is the set 
{0,1} with the usual multiplication. We denote this monoid U\. 
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What are the regular languages recognized by U\! If A is a finite alphabet and 
ip: A* —>• (7i is a morphism, then any language L C A* recognized by ip —that is, 
any set of the form p~ x {X) where X C U \—has either the form B* or A*\B*, where 
B C A. In particular, membership of a word w in L depends only on the set a(w) of 
letters occurring in w (see Example 4.9 in Chapter 1). 

The property ‘membership of w in L depends only on a(w)’ is preserved under union 
and complement, and thus defines a boolean algebra of regular languages. Of course, not 
every language in this boolean algebra is recognized by Up, for example, we could take 
L = a* Ub*. However, it follows from basic properties of the syntactic monoid that this 
boolean algebra consists of precisely the languages recognized by finite direct products 
of copies of U\. 

We have thus characterized a syntactic property of regular languages in terms of an 
algebraic property of its syntactic monoid. The family of finite monoids that divide a 
direct product of a finite number of copies of U\ is itself closed under finite direct products 
and division. Such a family of finite monoids is called a pseudovariety. This particular 
pseudo variety is often denoted Ji in the literature 1 . 

1.1.1 Decidability and equational description Thus if we want to decide whether a 
given language L C A* has this syntactic property, we can compute Synt(L) and try to 
determine whether Synt(L) £ Ji. But how do we do that? There are, after all, infinitely 
many monoids in Ji. We can, however, bound the size of the search space in terms of | A \. 
It is not hard to prove that if M is a finite monoid, and 

p: A* —t M x ■■■ x M 

'- - -' 

r times 

is a morphism, then N = p{A*) embeds into 

M x ■■■ x M, 

S. ^ 

s times 

where s = This settles, in a not very satisfactory way, the question of deciding 

whether Synt(L) is in Ji: The resulting ‘decision procedure’—check all the divisors of 
[If ' and see if Synt(L) is isomorphic to any of them!—is of course ridiculously imprac¬ 
tical. Fortunately, there is a better approach: U\ is both commutative and idempotent (i.e., 
all its elements are idempotents).These two properties are preserved under direct products 
and division, and consequently shared by all members of Ji. That is, the idempotent and 
commutative monoids form a pseudovariety that contains Ji. Conversely, every idempo¬ 
tent and commutative finite monoid belongs to Ji. To see this, we make note of a fact that 
will play a large role in this chapter: If M is a finite monoid and p: A* -> Man onto 
morphism, then 

M -< Synt(^ _1 (77i)). 

mEM 

In particular, every pseudovariety is generated by the syntactic monoids it contains. We 
now observe that if a(w i) = a(w 2 ), and if p: A* —yM is a morphism onto an idem- 
potent and commutative monoid, then p(w\) = p(wz), since we can permute letters 

1 It is also written SI because its elements are called semilattices. 
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and eliminate duplications in any word w without changing its value under ip. Thus each 
satisfies our syntactic property, and so by the remark just made, M £ Ji. 

We can express ‘M is idempotent and commutative’ by saying that M satisfies the 
identities xy = yx and x 2 = x. This means that these equations hold no matter how we 
substitute elements of M for the variables x and y. This equational characterization of 
Ji provides a much more satisfactory procedure for determining if a monoid M belongs 
to Ji : If M is given by its multiplication table, then we can verify the identities in time 
polynomial in \M\. 

1.1.2 Connection to logic Before leaving this example, we note a connection with for¬ 
mal logic. We express properties of words over A* by sentences of first-order logic in 
which variables denote positions in a word. For each a £ A, our logic contains a unary 
predicate Q a , where Q a x is interpreted to mean ‘the letter in position a: is a’. We allow 
only these formulas Q a x as atomic formulas—in particular, we do not include equality as 
a predicate. A sentence in this logic, for example (with A = {a, b, c}) 

3x3yVz(Q a x A Q b y A ->Qbz) 

defines a language over A* , in this case the set of all words containing both a and 6, but 
with no occurrence of c. It is easy to see that the languages definable in this logic are 
exactly those in which membership of a word w depends only on a(w). 

The following theorem summarizes the results of this subsection. 

Theorem 1.1. Let A be a finite alphabet and let L C A* be a regular language. The 
following are equivalent. 

(i) Membership ofw in L depends only on the set a(w) of letters appearing in w. 

(ii) Synt(L) £ Ji, that is, Synt(L) divides a finite direct product of copies ofU\. 

(Hi) Synt(L) satisfies the identities xy = yx and x 2 = x. 

(iv) L is definable by a first-order sentence over the predicates Q a , a £ A. 


1.2 Piecewise-testable languages 

Suppose that instead of testing for occurrences of individual letters in a word, we test for 
occurrences of non-contiguous sequences of letters, or subwords. More precisely, we say 
that v = ai ■ ■ ■ a.k, where each a, £ A, is a subword of w £ A* if 

W = WqCLiWi • ■ ■ CLkWk 

for some Wq, ... ,Wk £ A*. We also say that the empty word 1 is a subword of every word 
in A*. The set of all words in A* that contain v as a subword is thus the regular language 

L v = A* aiA* ■ ■ ■ a k A*. 

We say that a language is piecewise-testable if it belongs to the boolean algebra generated 
by the L v . 


1.2.1 Decidability and equational description It is not clear that we can effectively 
decide whether a given regular language is piecewise testable. For the language class 
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of 1.1, we were able to settle this question by in effect observing that for every finite 
alphabet A there were only finitely many languages of the class in A*. For piecewise- 
testable languages, this is no longer the case. It is possible, however, to obtain an algebraic 
characterization of the piecewise-testable languages, and this leads to a fairly efficient 
decision procedure. We first note two relatively easy-to-prove facts. First, the monoids 
Synt(L„) are all J-trivial: This means that if to, m!, s, t, s', t' £ Synt(L„) are such that 
m = s'm't', m' = smt, then m = m! . Second, the family J of //-trivial monoids forms 
a pseudovariety. It follows then that the syntactic monoid of every piecewise-testable 
language is //-trivial. A deep theorem, due to I. Simon [43], shows that the converse is 
true as well: Every language recognized by a finite J -trivial monoid is piecewise-testable. 

Clearly, we can effectively determine, from the multiplication table of a finite monoid 
M, all the pairs (to, to') £ M x M such that m! = smt for some s, t £ M, and 
thus determine if M £ J. This gives us an algebraic decision procedure for piecewise- 
testability. 

Can the pseudovariety J be defined by identities in the same manner as Ji? The short 
answer is ‘no’. This is because satisfaction of an identity u = v, where u and v are words 
over an alphabet {x, y ,...} of variables, is preserved by infinite direct products as well as 
finite direct products and divisors. Consider now the monoids 

Mj = {1, to, to 2 , ..., to- 7 = to j + 1 }. 

Each Mj £ J, but ]~[ ;>1 Mj contains an isomorphic copy of the infinite cyclic monoid 
{1, a, a 2 ,...}, which has every finite cyclic group as a quotient. Thus every identity 
satisfied by all the monoids in J is also satisfied by all the finite cyclic groups, which are 
not in J. 

In spite of this, we can still obtain an equational description of J, provided we adopt 
an expanded notion of what constitutes an identity. If s is an element of a finite monoid 
M, then we denote by the unique idempotent power of s. We will allow identities in 
which the operation x i-» x^ is allowed to appear; these are special instances of what we 
will call profinite identities. It is not hard to see that satisfaction of these new identities is 
preserved under finite direct products and quotients, and thus every set of such identities 
defines a pseudovariety. 

For example, the profinite identity 


= xx “ 

is satisfied precisely the finite monoids that contain no nontrivial groups. This is the 
pseudovariety of aperiodic monoids, which we denote Ap. Similarly, the profinite identity 

= 1 

defines the pseudovariety G of finite groups. As was the case with J, neither of these 
pseudovarieties can be defined by a set of ordinary identities. 

It can be shown that the pseudovariety J of finite //-trivial monoids is defined by the 
pair of profinite identities 

(■ xy) w x = (xy) u 

y(xy) u = {xyY, 
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or, alternatively, by the pair 

( Xy r = (yxr 


1.2.2 Connection with logic Let us supplement the first-order logic for words that we 
introduced earlier with atomic formulas of the form x < y, which is interpreted to mean 
‘position x is strictly to the left of position y’. The language L v , where v = a\ ■ • ■ a k , is 
defined by the sentence 

3x\3x-2 • • • 3x k (xi < x 2 A x 2 < x 3 A • • • A x k -i < x k A Q ai X\ A • • • A Q ak x k )■ 

This is a Si-sentence—one in which all the quantifiers are in a single block of existential 
quantifiers at the start of the sentence. It follows easily that a language is piecewise- 
testable if and only if it is defined by a boolean combination of Si-sentences. 

The following theorem summarizes the results of this subsection. 

Theorem 1.2. Let A be a finite alphabet and let L C A* be a regular language. The 
following are equivalent. 

(i) L is piecewise testable. 

(ii) Synt(L) G J, that is, Synt(L) is J-trivial. 

(Hi) Synt(L) satisfies the identities (xy)^ = ( yx ) w and xx u = x 1 ^. 

(iv) Synt(L) satisfies the identities (xy^x = y(xy) u . 

(v) L is definable by a boolean combination ofY>\-sentences over the predicates < and 
Qaj Cl G A. 

1.3 Pseudovarieties of monoids and varieties of languages 

We tentatively extract a few general principles from the preceding discussion. These 
will be explored at length in the subsequent sections. Given a pseudovariety V of finite 
monoids and a finite alphabet A, we form the family A*V of all regular languages L C 
A* for which Synt(L) G V. We can think of V itself as an operator that associates 
to each finite alphabet A a family of regular languages over A. V is called a variety of 
languages. (We will give a very different, although equivalent definition of this term in 
our formal discussion in Section 2.) From our earlier observation that pseudovarieties are 
generated by the syntactic monoids they contain, it follows that if V and W are distinct 
pseudovarieties, then the associated varieties of languages V and W are also distinct. Thus 
there is a one-to-one correspondence between varieties of languages and pseudovarieties 
of finite monoids. 

Often we are interested in the following sort of decision problem: Given a regular 
language L C A*, does it belong to some predefined family V of regular languages, for 
example, the languages definable in some logic? If V forms a variety of languages, then 
we can answer the question if we have some effective criterion for determining if a given 
finite monoid belongs to the corresponding pseudovariety V. (The converse is true as 
well: if we could decide the question about membership in the variety of languages, we 
would be able to decide membership in V.) 
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Pseudovarieties are precisely the families of finite monoids defined by sets ofprofinite 
identities. For the time being this assertion—a theorem due to Reiterman— will have 
to remain somewhat vague, since we haven’t even come close to saying what a profinite 
identity actually is! Such equational characterizations of pseudovarieties are frequently 
the source of the decision procedures discussed above. 

If V is a variety of languages, then, as we have seen, each A* V is closed under boolean 
operations. Observe further that if L G A*V and v G A*, then both of the quotient 
languages 

v _1 L = {w £ A* | vw G L} 

Lv~ x = {w G A* | wv G L} 

are in ,4*V. because any monoid recognizing L also recognizes the quotients. For the 
same reason, if p: B* —> A* is a morphism, <p -1 (L) is in B*V. An important result, due 
to Eilenberg, showed that these closure properties characterize varieties of languages. 

Theorem 1.3. Let V assign to each finite alphabet A a family A*V of regular languages 
in A*. V is a variety of languages if and only if the following three conditions hold: 

(i) Each A*V is closed under boolean operations. 

(ii) If L G A*V and w G A, then w~ 1 L G A*V, and Lw G A*V. 

(Hi) If L G A*V and p: B* —> A* is a morphism of finitely generated free monoids, 
then <p -1 (L) G B*V. 

This theorem can be quite useful for showing, in the absence of an explicit algebraic 
characterization of the corresponding pseudovariety of monoids, that a combinatorially or 
logically defined family of languages forms a variety. We conclude from this that such an 
algebraic characterization in principle exists. 

Although it is somewhat involved, Theorem 1.3 is quite elementary, see [18, 32], In the 
next section we will revisit the definition of varieties of languages and profinite identities 
in a way that will permit us to prove both Theorem 1.3 and Reiterman’s theorem in a 
single argument. 

Before we proceed with this program, we briefly describe certain classes of regular 
languages which admit syntactic characterizations (that is: characterizations in terms of 
syntactic monoids and syntactic morphisms), but which are not varieties in the sense 
described above. 


1.4 Extensions 

Interesting classes of regular languages frequently admit characterizations in terms of 
their syntactic monoids and syntactic morphisms, and the theory sketched above is meant 
to provide a formal setting for this algebraic classification of regular languages. However, 
the framework is not adequate to capture all the examples of interest that arise. Here we 
give three examples. 

Consider, first, the family A*ICi of languages L C A* for which membership of w 
in L is determined by the leftmost letter of w. This class forms a boolean algebra closed 
under quotients, but is not a variety of languages. To see this, note that a(a + b)* G 
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{a, b}*IC i and c*a(a + b + c)* ^ {a, b, c}*/Ci, even though the two languages have the 
same syntactic monoid. Alternatively, we can reason using Theorem 1.3, and note that 
the second language is an inverse homomorphic image of the first, and thus K,\ fails to be 
a variety of languages. More generally, we can define the family A*/Q of languages L 
for which membership of w in L depends only on the leftmost min(|u;|, d) letters of w, 
as well as A*K, = (Jd>o A*lCd- All these families are closed under boolean operations 
and quotients, yet fail to be varieties of languages. 

We obtain an example with a similar flavor if we supplement the predicate logic de¬ 
scribed earlier by atomic formulas x = q 0 , where q > 1 , which is interpreted to mean that 
position x is divisible by q. (We assume that positions in a word are numbered, beginning 
with 1 for the leftmost position.) We denote by A* QA the family of languages over A* 
definable in this logic. Languages in A* QA arise as the regular languages definable in the 
circuit complexity class AC 0 (see [10]). Each A*QA is a boolean algebra closed under 
quotients, however QA is not a variety of languages: To see this, consider the morphism 
{a, 6 }* —>■ {a}* that maps a to a and b to the empty string. The set {a 2n | n A 0} is in 
{a}* QA, as it is defined by by the sentence 

\/x(\/y(y < x) —>■ x =2 0 ). 

However the inverse image of this language under the morphism is the set of strings over 
{a, b} with an even number of occurrences of a , and it is possible to prove by model- 
theoretic means that this language is not definable in our logic. 

Finally, consider the family A* J + of languages definable by Ei-sentences over the 
predicates < and Q a with a £ A (in contrast to the languages definable by boolean 
combinations of £1 -sentences, which we considered earlier). It is easy to see that if 
L £ A* J + and w £ L, then L w C L. This readily implies that A* J + is not closed 
under complement, since, for example, the complement of (a + b)*a(a + b)* does not 
have this property. Thus J + is not a variety of languages. On the other hand, it does 
satisfy many of the properties of varieties of languages: It is closed under finite unions 
and intersections, quotients, and inverse images of morphisms between free monoids. 

It turns out that each of these three examples admits an algebraic characterization in 
terms of classes that are very much like pseudovarieties. For our first example, in which 
membership of a word in a language is determined by the leftmost letter, the correct 
generalization of pseudovarieties was already known to Eilenberg: One looks not at the 
syntactic monoid of a language L , but at the image of the set A 4 of nonempty words 
under the syntactic morphism. This is called the syntactic semigroup of L. We can define 
pseudovarieties of finite semigroups just as we defined pseudovarieties of finite monoids. 
Then L £ A* K, \ if and only if its syntactic semigroup belongs to the pseudovariety of 
semigroups defined by the identity xy = x. While K, \ is not closed under inverse images 
of morphisms between free monoids, it is closed if we restrict ourselves to non-erasing 
morphisms—those that map every letter to a nonempty word. 

We can use a similar method to characterize the class QA. Once again we look not 
just at the syntactic monoid of a language L , but at the additional structure provided 
by the syntactic morphism r/L- It is known that L £ A* QA if and only if for every 
k A 0, 7 //, ( A k ) contains no nontrivial groups [10]. The family QA of morphisms from 
free monoids onto finite monoids with this property forms a kind of pseudovariety with 
respect to appropriately modified definitions of direct product and division. An equational 


516 


H. Straubing, P. Weil 


characterization of QA is provided by the identity 

(x u ~ 1 y) u = (a ■“- 1 yT + \ 

where the identity is interpreted in the following sense: <p G QA if and only if for all 
words 11 and v of the same length, x = p{u) and y = p(v) satisfy the identity. QA is 
closed under inverse images of morphisms /: B* —>• A* such that f(B) C A k for some 
k 0; these are called length multiplying morphisms. In fact, these last two examples 
are instances of a single phenomenon: Families of morphisms ip: A* —> M onto finite 
monoids that form pseudovarieties with respect to some underlying composition-closed 
class C of morphisms between free monoids. 

For the example J + of Ei-definable languages, the algebraic characterization in¬ 
volves a different generalization of pseudovarieties. Here the additional structure on the 
syntactic monoid is provided by the embedding of til{L) in Synt(L) : If mi, m 2 G M 
then we say mi m 2 if 

{(s,f) G Synt(L)xSynt(L) | sm 2 t € C {(s, t) G Synt(L)xSynt(L) | smit G 

This gives a partial order on Synt(L) compatible with multiplication (see Section 4.4 in 
Chapter 1). We then find that L G A* J + if and only if this ordered syntactic monoid 
satisfies the inequality x ^ 1 for each element x. The family of partially-ordered monoids 
satisfying this inequality is a pseudovariety of ordered finite monoids—it is closed under 
finite direct products, and order-compatible submonoids and quotients. The theory of 
pseudovarieties of ordered monoids and the corresponding positive varieties of languages 
is due to Pin [33] 

In the next section we will formally develop the framework that gives the correspon¬ 
dence between pseudovarieties and language varieties, and the definition by profinite 
identities, in a very general setting. Pseudovarieties of finite monoids, as well as all the 
generalizations mentioned above, will appear as special cases. 


2 Equations, identities and families of languages 

The original statement of Eilenberg’s theorem dealt exclusively with varieties of lan¬ 
guages. Here we will show how to use a whole hierarchy of increasingly complex equa- 
tional characterizations of increasingly structured families of languages. Before we de¬ 
scribe these results, we need to give a quick introduction to the free profinite monoid and 
its connection to the theory of regular languages 


2.1 The free profinite monoid 

Say that a finite monoid M separates two words u,v G A* if there exists a morphism 
ip: A* —>• M such that (p(u) ^ <p(v). Note that if u v, there always exists such a 
monoid. Indeed, for each n ^ 1, consider the quotient monoid A*/A^ n : it consists of 
the set of words of length less than n, plus a zero, and each product with length at least n 
(in A*) is equal to 0. Then A*/A^ n separates u and v if n > max(|u|, |v|). We denote 
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by r(u , v) the minimum cardinality of a monoid separating u and v. 

The profinite distance on A* is defined by letting d(u,v) = 2~ r ^ u ’ v ' > if u v and 
d(u, u) = 0. One verifies easily that d is in fact an ultrametric distance (it satisfies 
the ultrametric inequality d(u,v) ma x(d(u,w),d(v,w)), stronger than the triangle 
inequality), and the above discussion shows that the resulting metric space is Hausdorff. 

The topology thus defined on A* is not especially interesting: we get a discrete space, 
where a sequence ( u n ) n converges to a word u if and only if (u n ) n is ultimately equal to 
u. .. This can be verified using the monoids A*/A^ n described above. There are, however, 
non-trivial Cauchy sequences. In fact, one can show the following. 

Proposition 2.1. A sequence (u n ) n is Cauchy if and only if for each morphism ip: A* —> 
M into a finite monoid, the sequence (p(u n )) n is ultimately constant. 

For instance, if u is a word, then (u n ') n is a Cauchy sequence (this can be deduced 
from the fact that its image under any morphism into a finite monoid is ultimately con¬ 
stant), but it is non-trivial if u ^ 1. In topological terms, the uniform structure defined by 
the profinite distance is non-trivial. 

Using a classical construction from topology (analogous to the construction of the real 
numbers from the rationals), we can now consider the completion of (A*, d), denoted by 
A*. It can be viewed as the quotient of the set of Cauchy sequences in (A*,d) by the 
relation identifying two sequences (u n ) and (v n ) if the mixed sequence, alternating the 
terms of (u n ) and (v n ), is Cauchy as well. In particular. A* is naturally seen as a dense 
subset of A*. 

The following results can be verified by elementary means. 

Proposition 2.2. Let A be an alphabet. 

(1) The multiplication operation (u, v) uv in A* is uniformly continuous. 

(2) Every morphism ip : A* — » B* between free monoids, and every morphism if): A* — > 
AT from a free monoid to a finite monoid (equipped with the discrete distance) is 
uniformly continuous. 

(3) A* is a compact space. 

By a standard property of completions, it follows from Proposition 2.2 (1) that the 
multiplication of A* can be extended to A*: the resulting monoid is called the free profi¬ 
nite monoid on A. Similarly, Proposition 2.2 (3) shows that each morphism <p: A* —>• B* 
between free monoids (resp. each morphism ip-. A* M from a free monoid to a 
finite monoid) admits a uniquely defined continuous extension, ip: A* —>• B* (resp. 
A* M). 

For example, consider the Cauchy sequence ( u n ') n , where u € A* , which we dis¬ 
cussed above. This represents an element of A*, which we will denote u u . Observe that 
for any morphism <p from A 4 into a finite monoid, the sequence tp(u n -) is ultimately con¬ 
stant and equal to the unique idempotent power of <p(u), so in the notation we introduced 
earlier we have, very conveniently, 

(ppuA 1 ) = {<p{u)y. 
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We can similarly define u“ _1 as the element of A* represented by the Cauchy sequence 

Finally, we note the strong connection between regular languages and free profinite 
monoids. 

Proposition 2.3. Let A be an alphabet and let L C A*. 

(1) L is regular if and only if its topological closure in A*, L, is clopen (i.e., open and 
closed), if and only if L = K fi A* for some clopen set I\ C A*. 

(2) If L is regular and u £ A*, then the following are equivalent: 

(i) u £ L; 

(ii) p(u) £ <p{L) for every morphism pfrom A* to a finite monoid; 

(Hi) p(u) £ p(L) for every morphism (p from A* to a finite monoid recognizing 
L; 

(iv) f](u) £ rj(L) where rj is the syntactic morphism of L. 


2.2 Equations and lattices of languages 

We begin our study of families of regular languages with the simplest such family: a lat¬ 
tice of languages over a fixed alphabet. In this chapter, we define a lattice of languages 
over an alphabet A to be a set of languages over A which is closed under finite union and 
finite intersection, and which contains A* and 0 (respectively, the union and the intersec¬ 
tion of an empty family of languages). 

A profinite equation on A is a pair ( u , v) of elements of A*, usually denoted by u —> v. 
If u,v £ A*, the equation is called explicit. A language L C A* is said to satisfy the 
equation u —> v, written L h u —> v, if 

u £ L => v £ L. 

Remark 2.4. It is important to note that u, v and the words in L are all defined over 
the same alphabet A. In contrast to the identities we encountered in Section 1, in this 
definition, the letters occurring in u and v are not considered as variables, to be replaced 
by arbitrary elements. We will formally define identities in Section 2.4. 

The notion of equation is particularly relevant for regular languages. The following 
results directly from Proposition 2.3. 

Proposition 2.5. Let L C A* be regular and let u. v £ A*. 

(1) Ifu, v £ A*, then I b u —>■ v if and only ifu £ L =P v £ L. 

(2) If rj is the syntactic morphism of L, then L h u —> v if and only iff](u ) £ 77 (L) => 
fj(v) £ rj(L). 

Let E be a set of equations on A. We denote by C(E) the set of regular languages in 
A* which satisfy all the equations in E. It is immediately verified that this set is closed 
under unions and intersections. Further, both 0 and A* satisfy every equation. So C(E) 
is a lattice. The main theorem of this section states that all lattices of regular languages 
arise this way. 
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Theorem 2.6. Let C be a class of regular languages in A*. Then C is a lattice if and only 
if there exists a set E ofprofinite equations on A such that £ = C(E). 

We have already seen that one direction of this equivalence holds: every set of the 
form C(E) is a lattice. The proof of the converse is obtained after several steps. The first 
concerns the set of equations satisfied by a given language. If L C A*, let 

El = |(u,v) £ A* x A* | L h u —> i>j . 

Lemma 2.7. If L is regular, then El is clopen. 

Proof By definition of the satisfaction of equations, we have 

E l = |(w,n) £ A* x A* I (u L) V (v £ i) j = (r x A*^j U ( A * x 7^ . 
Lemma 2.7 follows from the fact that A*, L and L c are compact (since L is regular). □ 

The proof of the next claim illustrates the crucial role played by the compactness of 
A*. Let £ be a lattice of regular languages in A* and let Ec = fl LeC El- 

Lemma 2.8. Let L be a regular language in £(El): that is, L satisfies all the profinite 
equations satisfied by all the elements of L. Then there exists a finite subset K, of C such 
that L £ C(E/c). 

Proof By Lemma 2.7, El and each E' K (K £ £) are open sets. Moreover, if (u,v) 
does not belong to any of the Ef ( K £ £), then (u. v) belongs to each Ek, that is, 
every language in £ satisfies u —> v. It follows that L satisfies u —>■ v as well, that is, 
(u. v) £ El- Therefore If and the Ef (I\ £ £) form an open cover of A*. 

By compactness, there exists a finite subcollection 1C of £ such that A* is covered 
by El and the E/ K , K £ 1C. It follows that El contains the complement of (J/vete Ef, 
namely the intersection Hirex; Ek- That is, L satisfies all the equations satisfied by the 
elements of 1C, which establishes the claim. □ 

We are now ready to prove Theorem 2.6, by showing that if £ is a lattice of regular 
languages in A*, then £ = C(Ec)- It is immediate by construction that £ is contained 
in C{Ec)- Let us now consider a language L £ C(Ec). By Lemma 2.8, we have L £ 
£(£jc) for a finite subset K, of £. 

For each u £ L, let K,(u) be the intersection of the languages K £ 1C containing u. 
Even though L may be infinite, K,(u) takes only finitely many values since K, is finite. By 
definition of the IC(u), we have L C (J L lC{u), a finite union. 

Conversely, let v £ Uugl ^( u )- Then there exists a word u £ L such that v belongs 
to every K £ /C containing u. That is, every K £ K, satisfies the equation u —> v. In other 
words, u —> v lies in E/c, and hence L satisfies that equation. Since u £ L, it follows that 
v £ L. Thus L = Uugl fc( u ) anc l hence L £ £, which concludes the proof. 
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2.3 More classes of languages: from lattices to varieties 

Here we explore how classes of regular languages that are more structured than lattices 
can be defined by more structured sets of equations. We start with an elementary lemma. 

Lemma 2.9. Let C be a lattice of regular languages satisfying the profinite equation 
u —y v. 

(1) If C is closed under complementation, then C also satisfies v —> u. 

(2) If C is closed under quotients, then C satisfies the equations xuy —> xvy, for all 
x, y £ A*. 

Proof It follows from the definition of equations that L satisfies u v if and only if its 
complement satisfies v —> u. The first part of the claim follows immediately. 

It is also elementary that, if x, y £ A* and x~ 1 Ly~ 1 b u —> v, then L b xuy —► xvy. 
Thus, if C is closed under quotients, then £ satisfies all the equations xuy —> xvy with 
x,y £ A*. This holds also if x, y £ A* since Ec is closed and A* is dense in A*. □ 

We now extend the notion of profinite equations as follows: if u, v £ A*, we say that 
a language L satisfies the symmetrical equation u ■(->• v if L satisfies both u —»• v and 

v —y u. 

We also say that a language L satisfies the profinite inequality v ^ u if it satisfies all 
the equations of the form xuy —> xvy with x,y £ A*, and it satisfies the profinite equality 
u = v if if satisfies both u £ v and v £ u. The verification of the following corollary is 
now elementary. 

Corollary 2.10. Let £ be a set of regular languages in A*. 

(1) Then C is a boolean algebra if and only if £ = £{E)for some set E of symmetrical 
profinite equations on A. 

(2) £ is a lattice closed under quotients if and only if £ = £(E) for some set E of 
profinite inequalities on A. 

(3) £ is a boolean algebra closed under quotients if and only if £ = £(E) for some 
set E of profinite equalities on A. 


2.4 Identities and varieties 

We now come to the historically and mathematically important class of varieties. Varieties 
of languages were defined in Section 1.3 but we will not use this definition here. In fact, 
in the course of this section, we will give an alternate, equivalent definition of varieties. 

An important difference between varieties and the lattices of languages over a fixed 
alphabet discussed so far in Section 2, is that a variety V consists of a collection of lat¬ 
tices A*V, one for each finite alphabet A. More generally, we define a class of regular 
languages V to be an operator which assigns to each finite alphabet A, a family A*V of 
regular languages in A*. 

First, we prove a technical lemma. 

Lemma 2.11. Let <p: A* —y B* be a morphism, L C B* and u, v £ A*. 
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(1) <p{u) £ L if and only ifu £ <^ -1 (X). 

(2) L satisfies (p(u) —>■ fi{v) if and only t/(^ _1 (L) satisfies u —> v. 

Proof The first statement is trivial if u,v £ A*: indeed, tp and (p coincide on words, 
and the intersection of L (resp. <^ -1 (L)) with A* (resp. B*) is L (resp. ip 1 (T)). The 
extension to the case where u, v £ A* is obtained by density. 

The second statement follows immediately from the first and the definition of prohnite 
equations. □ 

We extend the notion of prohnite equations, this time to prohnite identities, to permit 
the treatment of classes of regular languages instead of lattices of regular languages over 
a hxed alphabet. Since there is no alphabet of reference anymore, we will usually denote 
by X the alphabet over which prohnite identities are written. 

Let C be a composition-closed class of morphisms between free monoids, u, v £ X* 
and L C A* , where X and A are hnite, but possibly different alphabets. We say that 
L C-identically satisfies u — > v if, for each morphism ip: X* —> A* in C, L satishes 
tp(u) —t <p(v). We say that a class of regular languages V C-identically satishes an 
equation if A* V does, for each hnite alphabet A. 

The following statement is a direct application of Lemma 2.11. 

Corollary 2.12. Let V be a class of regular languages, let C be a family of morphisms 
between free monoids closed under composition, such that whenever ip: A* —> B* is in 
C and L £ B*V, then </>~ 1 (L) £ A*V. 

If X*V satisfies the profinite equation u —> v (with u,v £ X*), then V C-identically 
satisfies u —> v. 

Using the notions introduced in Section 2.3, we say that L satishes the profinite C- 
identity u = v (resp. profinite ordered C-identity u ^ v) if L C-identically satishes u = v 
(resp. u v ). If E is a set of prohnite equations and for each hnite alphabet A, A*V is 
the set of regular languages in A* which C-identically satisfy the elements of E, we say 
that the resulting class of regular languages V is C-defined by E. 

Let us now dehne (positive) C-varieties: a class V of regular languages is a positive C- 
variety (resp. a C-variety) of languages if each A* V is a lattice (resp. a boolean algebra) 
closed under quotients and if, for each ip: A* —>• B* in C and each L £ B*V, we have 
ip- x {L) £ A*V. 

If C is the class of all morphisms between free monoids, we drop the prehx C and 
simply talk of (ordered) prohnite identities and (positive) varieties of languages. 

Collecting Corollaries 2.10 and 2.12, we have the following characterizations. 

Theorem 2.13. Let V be a class of regular languages and let C be a composition-closed 
class of morphisms between free monoids. Then V is a positive C-variety (resp. a C- 
variety) if and only ifV is C-defined by a set of profinite ordered C-identities (resp. profi¬ 
nite C-identities). 

Remark 2.14. In Section 1.3, we gave a different dehnition of varieties of languages, and 
Theorem 1.3 stated that it was equivalent to the dehnition given above. We will prove this 
equivalence in Section 2.5 below, thus formally reconciling the two definitions. 
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2.5 Eilenberg’s and Reiterman’s theorems 

We note that (in)equalities can be interpreted in the (ordered) syntactic monoid of a lan¬ 
guage. Let L be a regular language in A* and let u, v £ A*. By Proposition 2.5, if 77 is 
the syntactic morphism of L, then L F v ^ u if and only if fj(v) ^l v( u )- 

Thus membership of a regular language L in a lattice of regular languages closed 
under quotients is characterized by properties of the syntactic morphism of L. 

We can also interpret identities in abstract finite ordered monoids—that is, finite 
monoids in which there is a partial order compatible with multiplication: If u,v £ X*. 
we say that a finite ordered monoid M satisfies the profinite identity a < v if for every 
morphism p: X* —>• M we have ip{u) = p(v). Likewise a monoid M satisfies the profi¬ 
nite identity it = u if for each such p we have p(u) = p(v). We extend this notion further 
to C-satisfaction of identities. We call a morphism ip: A* —> M, where M is finite and p 
maps onto M, a stamp. We also define ordered stamps as morphisms from a free monoid 
A* onto an ordered finite monoid. (Such morphisms are automatically order-preserving 
if we consider the trivial ordering on A* in which w\ A W 2 if and only if W\ = up .) Let 
C be a class of morphisms between finitely generated free monoids that is closed under 
composition and that contains all the length-preserving morphisms. We say that the or¬ 
dered stamp p: A* —> (M, C-satisfies the profinite identity u ^ v with it, v £ X* if 
and only if for all morphisms ip: X* —> A* with ip £ C, we have pip(u ) ^ pip(v). We 
similarly define C-satisfaction of identities u = v by (not necessarily ordered) stamps. 

We have already defined pseudovarieties of finite monoids in Section 1. We can extend 
this definition to define C-pseudovarieties of stamps. We call a collection V of stamps a 
C-pseudovariety if it satisfies the following two conditions: 

(i) If p: A* —> M is in V, ip: B* —> A* is in C, and rj is a morphism from Im(tpip ) 
onto a finite monoid N, then ijpip: B* —>• N is in V. 

(ii) If pi: A* —>■ Mi are in V for i = 1,2, then p\ x pi: A* —>■ Im(p\ x pfi) C 
Mi x M 2 is in V. 

If we restrict the morphisms occurring in these definitions to order-preserving morphisms 
or ordered monoids, we obtain the definition of ordered C-pseudovarieties of stamps. 
Ordinary pseudovarieties coincide with C-pseudovarieties in the case where C contains all 
morphisms between finitely-generated free monoids. 

We say that a class V of finite (ordered) monoids is defined by a set E of identities 
(written V = [£7]) if V consists of all the finite (ordered)monoids that satisfy all of the 
identities in E. Similarly, we say that a family V of stamps is C-defined by E (we write 
V = [f?]c) if V consists of all the stamps that C-satisfy these identities. 

Further if V is a class of monoids or stamps, ordered or unordered, we define the 
corresponding class V of languages by setting L £ A*V if and only if Synt(L) £ Y (if V 
is a class of monoids) or t]l £ V (if V is a class of stamps). We write V i-> V to denote 
this correspondence. 

This leads us to a restatement of Eilenberg’s Theorem, Theorem 1.3 above, as well as 
its generalization to C-varieties, and allows us to prove it simultaneously with Reiterman’s 
Theorem. 


Theorem 2.15. 
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(1) (Eilenberg’s Theorem) If V is a pseudovariety (respectively C-pseudovariety, or¬ 
dered pseudovariety) and V H > V, then V is a variety of languages (respectively 
C-variety of languages, positive variety of languages) and in each case this gives a 
one-to-one correspondence between pseudovarieties and varieties of languages. 

(2) (Reiterman’s Theorem) A class V of monoids (stamps, ordered monoids) is a pseu¬ 
dovariety (respectively C-pseudovariety, ordered pseudovariety) if and only if it is 
defined (C-defined) by a set ofprofinite identities. 

In the argument we sketch below, we confine ourselves to the case of ordinary monoids, 
but everything generalizes in an entirely straightforward fashion to ordered monoids and 
stamps. The key to the proofs of both parts of the theorem is Theorem 2.13 above, along 
with the following elementary but very useful lemma, already brought to the reader’s 
attention in Section 1.1. 

Lemma 2.16. Let ip: A* —y AT be a morphism into a finite monoid. Then AT divides the 
direct product of the syntactic monoids of the syntactic monoids of the languages (m), 

m G AT. 

Proof. For each to G AT, let : A* —> Synt(^ _1 (?n)) be the syntactic morphism of 
^> -1 (to). It suffices to show that for each u, v G A*. rj m (u) = for each m G AT 

implies p(u) = <p(v). 

Indeed, let m = p[u). Then u G y> -1 (m) and since rjm(v) = rjm(u), we have 
v G 1 (to), p(v) = to = p{u). □ 

Corollary 2.17. Every pseudovariety of monoids is generated by the syntactic monoids it 
contains. 

Proof. The result follows directly from Lemma 2.16, since AT recognizes each ip~ x (AT) 
(to G AT): thus each Synt(</> _1 (TO)) divides AT and hence lies in the pseudovarieties 
containing AT. □ 


Now let V be a variety of languages and let E be a set of profinite identities defining 
V. Let also V be the class of finite monoids satisfying the profinite identities in E. It is 
easily verified that V is a pseudovariety. 

Moreover, if L is a regular language in A*, we have L G A*V if and only if L h E, if 
and only if Synt(L) satisfies the profinite identities in E, if and only if Synt(L) G V. 

Thus V K > V in the correspondence described in Section 1.3. If W is another pseu¬ 
dovariety such that W i->- V, then V and W contain the same syntactic monoids, and 
Corollary 2.17 shows that V = W. This establishes Eilenberg’s Theorem. 

For Reiterman’s Theorem, we start with a pseudovariety V and consider the associated 
variety of languages V. The above reasoning shows that V is defined by any set of profinite 
identities which, seen in the setting of classes of languages, defines V. 

Note that these proofs are different from the classical proofs of Eilenberg’s theorem, 
in [18] or [32], and of Reiterman’s theorem, in [3], [36] or [38], 
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2.6 Examples of varieties 


We now look at some concrete instances of varieties, revisiting our examples from Section 
1, among others, in light of the theory presented above. In doing so, we will work from 
both sides of the correspondence between pseudovarieties and varieties of languages, at 
times beginning with a variety of languages, at others with a property of a class of finite 
monoids. 

2.6.1 Idempotent and commutative monoids We begin, as before, with the variety of 
languages corresponding to the pseudovariety Ji. For each finite alphabet A, let A * J\ 
be the smallest boolean-closed family of subsets of A* that contains all the languages 
B*, where B C A. Equivalently, it is the smallest boolean-closed set containing all the 
A*aA* (a £ A). Putting it again differently, A * J\ is precisely the family of languages L 
in A* for which membership of a word w in L depends only on the set a(w) of letters of 
w. This is because 


{u £ A* | a(v) = a(w)} = a(w)*\ |^J B*. 


BC.a(w) 


Observe that for all a £ A and B C A. 



Further, if C is another finite alphabet and ip: C* A* is a morphism, 

<p- 1 (B*) = (Cn<p- 1 (B))*. 


Left and right quotient and inverse image under morphisms all commute with boolean 
operations. So these two observations imply, independently of any algebraic considera¬ 
tions, that Ji is a variety of languages, and thus, by Theorem 2.13 is defined by a set 
of profinite identities. Further, from our proof of Eilenberg’s Theorem, the same set of 
identities defines the corresponding pseudovariety of finite monoids. 

Of course, we have already exhibited these identities, but let us see what they look 
like in the context of our equational theory. Let X = {x,y}, and let A be any finite 
alphabet. Every language L £ A * J\ satisfies the identities xy = yx and x 2 = x. since 
for any morphism p: X* —> A* and any u,v £ A*, a(up(xy)v) = a(utp(yx)v), and 
a(up(x 2 )v) = a(uip(x)v). Conversely, suppose L C A* satisfies these identities. We 
will show L £ A* J\\ Let w, w' £ B *, with w £ L and a(w ) = a(w'). We claim w' £ L. 
Since a(w) = a(w'), we can transform both w and w' into a common normal form w" 
by successively interchanging adjacent letters until the word is sorted (with respect to 
some total ordering on A) and then replacing occurrences of aa by a, where a £ A. 
Interchanging adjacent letters entails replacing ua\a 2 V by ua 2 a\V, where u,v £ A* and 
01,02 £ A. Since L satisfies the identity xy = yx, if ua\a 2 V £ L then ua, 2 aiv £ L 
(using the morphism tp: X* —> A* that maps x, y to 01 , 02 , respectively.). Similarly, 
replacing aa by a preserves membership in L, since L satisfies the identity x 2 = x. Thus 
Ji is defined by this pair of identities. It follows that the corresponding pseudovariety 
Ji of finite monoids is defined by the same pair of identities, and thus consists of the 
idempotent and commutative monoids. 
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2.6.2 Piecewise-testable languages Now let us consider the piecewise-testable languages 
of Section 1.2. We denote the family of piecewise-testable languages over a finite alpha¬ 
bet A by A* J. Let us look at the profinite identities satisfied by these languages. As 
observed earlier (Section 2.1), if u £ X* then the sequence (u n ') n is a Cauchy sequence 
whose limit is written u u . Moreover, for any morphism f: X* —>• A*, where A is a finite 
alphabet, f(u u ) = (p(u)) u (the idempotent power of ip(u)). Now let X = {x,y}. We 
claim that every piecewise-testable language L over A* satisfies the profinite identities 

(xy) u x = (. xy) u = y{xy) u . 

This is equivalent to saying that for all s, t,u,v £ A*, 

s(tu) u tv £ L 4$ s(tu) u v £ L -o- su(tu) u v £ L. 

Now fix an integer k > 0. For sufficiently large values of n, the words 

s(tu) n 'tv , s(tu) n 'v, su(tu) n 'v 

contain the same sub words of length k. Since L is piecewise-testable, for sufficiently large 
n, all but finitely many of the terms of the three sequences are either all in L or all outside 
of L. Since L is clopen, the three respective limits are either all in L or all outside L. 

Thus, as we showed in Section 2.5, the syntactic monoid of any piecewise testable 
language satisfies these same profinite identities. We arrive again at the observation 
that the syntactic monoid of every piecewise-testable language satisfies the identities 
(xy) u x = (xj/)“ = y(xy) LJ . That these identities define the pseudovariety J of finite 
J -trivial monoids is simple to establish. That they completely characterize the variety of 
piecewise-testable languages is the deep content of Simon’s Theorem [43], 

2.6.3 Group languages Similarly, the pseudovariety G of finite groups is defined by the 
profinite identity x u = 1. As a consequence, the corresponding variety Q of languages is 
defined by the same profinite identity. In contrast to the other examples presented here, 
we do not possess a simple description of Q in terms of basic operations on words. 


2.6.4 Left-zero semigroups We already appealed to Eilenberg’s Theorem in Section 1 
to show that the class KL\ is not a variety of languages. But we can show here that it 
is a C-variety for a slightly restricted class C of morphisms. Let C ne denote the class 
of non-erasing morphisms between finitely-generated free monoids-those p\ A* -£ B* 
such that for all a £ A, f(a) ^ 1. Let L £ A*K\. If s, t,u,v £ A *, and / 1, then 
stuv £ L if and only if stv £ L. Moreover, this property of L characterizes membership 
in A*K,\. One way to state this property is that the variety of languages IC \ is defined by 
the C ne -identity xy = x. Equivalently, the corresponding C nf .-pseudovariety Ki of stamps 
is defined by the same C ne -identity. This means M) £ Ki if f{uv) = f(u), 

for u,v £ A + . 

Alternatively, one may consider, instead of the C rae -pseudovariety generated by the 
syntactic morphisms of languages in ICi, the pseudo variety of finite semigroups generated 
by the images of nonempty words under the syntactic morphisms. This was the approach 
originally taken, but here we prefer to emphasize that all these many different flavors of 
pseudovarieties can be treated in the same general setting. 
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2.6.5 Quasiaperiodic stamps Whenever we have a morphism p: A* —> M, the family 
of sets 

{^(A s ) | s > 0} 

forms a subsemigroup of the power set semigroup V(M). As this is a finite cyclic semi¬ 
group, generated by p(A), it contains a unique idempotent. Thus there is some s > 0 
such that p(A s ) = p(A 2s ), so that p(A s ) is a subsemigroup of M. We call this the stable 
semigroup of p. Let QA denote the set of morphisms p from a free finitely-generated 
monoid onto a finite monoid such that p is surjective, and the stable semigroup of ip is 
aperiodic. 

We claim QA is a C; m -pseudovariety of stamps, where Ci rn consists of morphisms 
ip: A* —> B* between finitely generated free monoids such that all ip (a), where a £ A, 
are nonempty words having the same length. (The letters Im stand for length-multiplying, 
since the lengths of all words in A* are multiplied by a constant factor when ip is applied.) 
To see this, suppose (p: B* —> M) £ QA, and (ip: A* —>• B*) £ Ci m . Let p(B s ) be 
the stable semigroup of p, pip(A t ) the stable semigroup of pip: A* —>• Im(pip ), and 
k the length of each ip (a) for a £ A. Then pip(A t ) = pip(A st ) C p(A kst ) = p(A s ), 
and thus the stable semigroup of pip is also aperiodic. Further, if the stable semigroups 
pj(A Sj ) of stamps pj : A* —» Mj, for j = 1,2, are aperiodic, then the stable semigroup 
of pi x p 2 is contained in pi(A Sl ) x p 2 (A S2 ), and is therefore aperiodic.Thus QA is a 
C/m-pseudovariety, and is accordingly defined by a set of profinite C/ m -identities. What 
does it mean for a stamp p: A* —>• M to satisfy a C/ m identity u = vl In such an identity, 
u and v are elements of X* for some finite alphabet X. The identity is satisfied if for every 
morphism r ip: X* —> A* in C; m , pip(u) — pip(v). Informally, this says that so long as we 
replace the letters in u and v by elements of A + that all have the same length, the images 
in M are identical. We claim that QA is defined by the single profinite C; m -idcntity 

(x^yr = (x u ~ 1 yr +1 . 

Let us prove this. First, we show that QA satisfies the identity. Let (p: A* —> M ) £ 
QA, and choose p > 0 such that for all m £ M, rn 1 ' is idempotent. We then also have 
m ps idempotent for all m £ M, where p(A s ) is the stable semigroup of p. If the identity 
is not satisfied, then there exist words u and v in B* , both of length k > 0, such that 

(piu^-'v)) 13 * ± (^K s_1 tt)) ps+1 . 

Thus {(^(u ps_1 t/)) ps+r | r > 0} is a nontrivial group in p((A s ) + ) = p(A s ), contradict¬ 
ing membership in QA. Conversely, suppose a stamp p: A* —>• M satisfies the identity. 
Suppose the stable semigroup p(A s ) contains a group element g = p(u), with |it| = s. 
Let e = p(v), where |u| = s is the identity of this group. Since p satisfies the identity, 

e = pdvB-'vT) = p((v u ’~ 1 v) ul+1 ) = g~\ 

so every group in p(A s ) is trivial. 

We introduced the C; m -pseudovariety QA in Section 1 in quite different terms, by 
giving a logical description of the corresponding C/„ t -variety of languages. We will show 
in Section 3 that they do in fact correspond. 
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2.6.6 Si-languages As in Section 1.2.2, we denote by A*J + the family of languges 
over A defined by Si sentences. Languages in this family are precisely the finite unions 
of the languages L v , where v £ A*. We claim that J + is defined by the pro finite ordered 
identity x 1. A language L satisfies this identity if and only if for all u,v,w £ A*, 
whenever uw £ L, then uvw £ L. Clearly, each L v satisfies this identity. We must show, 
conversely, that any language satisfying this identity is a finite union of L v for various 
v £ A*. Certainly, if L satisfies the identity and v £ L. then L v C L. so that 

L= U Ly. 

vGL 

We need to show that this can be replaced by a finite union. Let T consist of the subword- 
minimal elements of L, that is, those v £ L such that no proper subword of v is in L. 
Then 

L= |J L v . 

v&T 

We now invoke a theorem of G.Higman [24]: The subword ordering in A* has no infinite 
antichains: That is, any set T of words in which no element is a strict subword of another 
element is finite. 

The corresponding ordered pseudovariety J + consequently consists of all partially or¬ 
dered finite monoids for which the identity 1 is the maximum element, and thus a language 
belongs to A* J + if and only if its ordered syntactic monoid satisfies this property. 


2.6.7 Languages with zero All of our examples so far have concerned some flavor of 
varieties of languages, language families that are defined across all finite alphabets and 
are closed under inverse images of morphisms between free monoids. Part of the great 
novelty of the equational theory of Gehrke et al. [22] presented here is that it applies to 
language classes with weaker closure properties. Here we give a simple example. 

We say a regular language L C A* is a language with zero if Synt(L) has a zero. 
This is equivalent to saying that there is a two-sided ideal J in A* such that either J C L 
or L IT J = 0. This property is easily seen to be closed under boolean operations and 
quotients. It is, not, however, closed under inverse images of any composition-closed 
class C of morphisms that contains the length-preserving morphisms. Indeed, let L C A* 
be any regular language without a zero, and let & be a new letter. Then, viewed as a 
subset of (A U {&})*, L has a zero, so this class is not closed under the inverse image 
of the length-preserving morphism that embeds A* in (A U {&})*. Nonetheless, by our 
Corollary 2.10, this class of languages is defined by a set of profinite inequalities. 

We now exhibit such a set of inequalities. We start by defining three sequences of 
words in A*. Let 

ui,u 2 ,. ■. 

be any enumeration of the elements of A*, let 

Vn — ^1 ’ ‘ ’ 'U'm 

and 

Wi = l,to n+ ! = (w n v n w n ) n '. 
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Look at the image of the Wi under a surjective morphism ip: A * -A M , where M is finite. 
Since every u £ A* occurs as a factor of all but finitely many w.i, almost all <p(wi) are in 
the minimal ideal K of M. Since for all m £ M, rn"' is idempotent for sufficiently large 
n, almost all p(wf) are idempotents in the minimal ideal of M. Finally, if ip(wi) is such 
an idempotent e, then ip(wi+ 1 ) is an idempotent in eKe. and so is itself equal to e. Thus 
for every finite monoid, the sequence (<p(w n )) n is convergent, so (w n ) n converges to an 
element pa of A*, such that P>(pa) is an idempotent in the minimal ideal of p( A* ). 

Suppose L C A* has a zero. Then the minimal ideal of Synt(L) consists of this 0 
alone, so if p is the syntactic morphism of L and a £ A, p(pa) = f)( a PA) = viPA®)- 
Thus L satisfies the equalities 

apA = PA = PACL 

for all a £ A. Conversely, if L satisfies these equalities, then the minimal ideal of //(/l*) 
contains just one element, so L is a language with zero. So these equalities define the 
class of languages with zero. 

2.6.8 Languages defined by density Say that a language L C A* is dense if every word 
of A* occurs as a factor of a word in L , that is, L (~l A* uA* f 0 for every u £ A*. The set 
consisting of A* and the non-dense languages forms a quotient-closed lattice, which is 
defined by the profinite inequalities x f 0 (x £ A*) —this is short for apA = Pa a = Pa 
for every a £ A and x ^ p a for every x £ A* [22], 

Now define the density of a language L as the function di(n) which counts the num¬ 
ber of words of length n in L. A language with bounded density (also called slender ) is 
easily seen to be a finite union of languages of the form xu*y (x, u,y £ A*)- Similarly, 
a language of polynomial density, also called sparse, can be shown to be a finite union 
of languages of the form u^viul ■ ■ ■ where the iq and Vj are in A*. Together with 
A*, the set of slender (resp. sparse) languages in A* forms a quotient-closed lattice of 
languages, for which defining profinite inequalities can be found in [22], 


2.7 Deciding membership in an equationally defined class of 

languages 

We are often interested in decision problems for families of regular languages: We say 
that a family T of regular languages over a finite alphabet A is decidable if there is an 
algorithm that, given a regular language in L C A* as input, determines whether L £ T. 
Here a regular language L is ‘given’ by specifying a DFA that recognizes L, or some other 
formalism (e.g., regular expression, logical formula) from which a DFA can be effectively 
computed. The problem arises, for example, if we are looking for a test of whether a given 
language is expressible in some logic for defining regular languages. (See Section 3.) 

We can similarly define decidable families of finite monoids: Such a family T is 
decidable if there is an algorithm that, given the multiplication table for a finite monoid 
M, determines whether M £ T. The definition extends in the obvious fashion to families 
of ordered monoids and stamps. For ordered monoids the input includes, in addition to 
the multiplication table of M, a representation of the graph of the partial order on M. For 
stamps ip: A* —>■ M we are also given the values <p(a) for a £ A. 


Varieties 


529 


We will say that a variety V of languages is decidable if A*V is decidable for every fi¬ 
nite alphabet A. In this case the Eilenberg correspondence theorem gives a rather obvious 
connection between the two kinds of decidable families: 

Theorem 2.18. A (positive) variety (respectively, C-variety) of languages is decidable if 
and only if the corresponding pseudovariety of (ordered) monoids (respectively, stamps) 
is decidable. 

Proof. We give the proof just for the case of ordinary varieties of languages and pseu¬ 
dovarieties of monoids; the argument is essentially the same for all the other variants. Let 
V be a variety of languages and V the corresponding pseudovariety of monoids. Suppose 
first that V is decidable. Let A = (Q, A, i, F) be a DFA recognizing a language L C A*. 
From A we can effectively construct the multiplication table of Synt(L). We then apply 
the algorithm for V to decide whether Synt(L) G V, and thus whether L G .4* V. Con¬ 
versely, suppose V is decidable. Let M be a finite monoid and choose a finite alphabet 
A together with a surjective morphism p: A* —> M. (For example, we could choose 
A = M and p the extension to A* of the identity map on A/.) Then by Lemma 2.16 
and Corollary 2.17, M divides the direct product of the monoids Synt(<^ - 1 (m)) for 
to G M, and each of the Synt(<p - 1 (m)) in turn divides M. Thus M G V if and only 
if each of the languages <p _ 1 (to) is in A*V. Furthermore, from p we can construct a DFA 
(M, A , 1, {to}) recognizing (£ -1 (to), and thus decide whether each is in A*V. Thus V is 
decidable. □ 

Decision problems for varieties of regular languages can have arbitrarily large compu¬ 
tational complexity, or indeed be undecidable. To see this, observe simply that if P is any 
set of primes, then we can form the pseudovariety Gp of finite groups G such that every 
prime divisor of |G| is in P. Testing membership of a given prime p in P then reduces, in 
time polynomial in p , to testing membership in Gp, so Gp is at least as complex as P. 

On the other hand, Reiterman’s theorem, which says varieties are defined by sets of 
profinite identities, suggests that we could determine membership in varieties simply by 
verifying whether identities hold in finite monoids. This is deceptive, since elements 
of X* do not generally have simple descriptions that make it possible to evaluate their 
images in finite monoids, and, further, the equational description of a pseudovariety might 
require inifinitely many profinite identities. We can nonetheless say something definitive 
about the complexity of the decision problems in the case where the equational definition 
consists of a finite set of profinite identities p = a, where p and o are w-terms in X*: 
This means that p and <j are formed from elements of X by successive application of 
concatenation and the operation tgt“. 

Theorem 2.19. Let V be a variety of languages defined by a finite set of profinite iden¬ 
tities of the form p = a, where p and <7 are lc- terms, and let V be the corresponding 
pseudovariety of finite monoids. Then V is decidable by a logspace algorithm in the size 
of the input multiplication table, and V is decidable by a polynomial space algorithm in 
the size of the input automaton. 

Proof. We first consider testing membership of a monoid M in V. Let \M\ = n. The 
multiplication table of M can be represented in 0(n 2 log n) bits and each element of M 


530 


H. Straubing, P. Weil 


by 0(log n) bits. We will show how to determine membership of M in V using k ■ log 2 n 
additional bits of workspace, where the constant k is determined by the length of the 
longest w-term occurring in the defining profinite identities for V. To make the proof 
easier to follow, let us suppose we have an identity ((cc“ y)^z)^ = (xz)^. The algorithm 
loops through all triples ( x , y , z) of elements of M and writes them in the workspace. It 
then uses log 2 n bits of additional workspace to compute x u . This is done by repeatedly 
consulting the multiplication table, writing x 2 ,x 3 ,... in the same workspace, and after 
each write, consulting the multiplication table to check if the element is idempotent. We 
similarly compute {x^y)^, ((a; aj y)“z) aj , and {xyz )^. All in all, we used 7 • log 2 n bits of 
workspace. After all the values are computed, we compare the last two. The algorithm 
rejects if it finds a mismatch. If it finds none, it goes on to the next identity, and accepts if 
all the identities are tested with no mismatch. 

We now turn to testing membership in V. The algorithm we give is actually a non- 
deterministic polynomial space algorithm for nonmembership of a regular language in 
A*V. Since, by Savitch’s Theorem ( [40], see, also Sipser [44]) nondeterministic polyno¬ 
mial space is equivalent to deterministic polynomial space, and the latter is closed under 
complement, this will be enough. Let us work with the same example identity we used in 
the first part of the proof. The algorithm begins by guessing words x, y, z and computing 
the vectors 


(q!X,...,q n x), 

(qiy,...,q n y), 

( qiz,...,q n z), 

where {qi,... ,q n } is the set of states of the input DFA. Observe that the words x, y, z 
themselves are not stored. Instead they are guessed letter by letter, and only the vectors of 
states are written in the workspace. This requires 0(n log n) bits, where n is the number 
of states of the DFA. Observe as well that once we have the vector (q\u ,..., q n u) we can, 
with an additional n log 2 n bits, compute the vector (qiu ™,..., q n u u ) : since we can write 
the vectors of the successive powers ( q\u k ,..., q n u k ) reusing the same workspace, and 
then check after each write whether qu k = qu 2k for each state q. As a result we obtain the 
vectors (qiip(p ),..., q n ip(p)), (qiip(cr ),..., q n (p{a)) for some morphism (p: X* —» A*. 
If these vectors turn out to be different, we accept. Thus this algorithm nondeterministi- 
cally recognizes the complement of A*V, using 0{n log n) space. □ 


The foregoing theorem illustrates a potentially large gap in complexity between test¬ 
ing membership in V from an input DFA and testing membership in the corresponding 
pseudovariety V from the multiplication table of a monoid. This is to be expected, since 
an automaton is in general exponentially more succinct than the multiplication table of 
its transition monoid. In some instances, however, it is possible to give efficient algo¬ 
rithms that begin with automata, using so-called ‘forbidden pattern’ characterizations of 
varieties. We illustrate this with a very simple example, using the ordered variety J + . 
Consider the following figure: 
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We say that a DFA [Q, A,i, F) contains this pattern if there are states q\ , q 2 and words 
u,v,w £ A* such that iu = q\ , q -2 = q\v, q±w £ F, q- 2 W (j F. We say the DFA avoids 
the pattern if it does not contain it. It is easy to see that a DFA recognizing a language 
L avoids this pattern if and only if whenever uw £ L, uvw £ L. Thus the languages in 
A* avoiding the pattern are exactly those that satisfy the inequality x ^ 1; that is, the 
language family A* J + . We use this to prove the following: 

Theorem 2.20. There is an algorithm determining membership in J + that runs in lion- 
deterministic logspace in the size of an accepting DFA. (In particular, membership can be 
determined in polynomial time.) 

Proof. We nondeterministically guess letters to obtain an accessible state q-[ , using log 2 n 
bits, where n is the number of states in the automaton. We then further guess letters to 
obtain another state <72 = Qi v -. written on another log 2 n-bit field in the work space. Fi¬ 
nally, we guess more letters, applying them to both components of the pair (< 71 , g 2 ) and 
arrive at at a state (q\w, q 2 w). We accept if the first member of this pair of states is an ac¬ 
cepting state of the DFA and the second is not. Thus we have a nondeterministic logspace 
algorithm for the regular languages outside of J + . But by the theorem of Immerman 
and Szelepcsenyi (see [26], [51], also [44]), nondeterministic logspace is closed under 
complement, so we have the desired result. 

□ 

The same reasoning is used in many proofs showing that varieties of languages are 
decidable in nondeterministic logspace: find a forbidden pattern characterization of the 
variety using a fixed number of states. (For instance. Pin and Weil [37], Glasser and 
Schmitz [23] .) While such results appear to bridge the complexity gap between polynomial¬ 
time algorithms that begin with a multiplication table and exponential-time algorithms 
that begin with an automaton, forbidden pattern arguments are not always available. In 
particular, we have the following result, which we cite without proof, from Cho and 
Huynh [17]: 

Theorem 2.21. Testing whether a regular language given by a DFA is aperiodic is 
PSPACE-complete. 


3 Connections with logic 

In Section 1 we outlined, in an informal way, some of the logical apparatus for expressing 
properties of words over a finite alphabet. Here we give a more precise and general 
description. As before, variable symbols x, y, X\. X 2 , etc., denote positions in a word. For 
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each a € A our logics have a unary predicate symbol (),,, where Q a x is interpreted to 
mean ‘the symbol in position x is a.’ We also have a binary predicate symbol s, where 
s(x, y ) is interpreted to mean ‘position y is the successor of position x’ . We will usually 
use the alternative notation y = x + 1 for this. 

We now consider monadic second-order formulas over this base of predicates. These 
are formulas built not merely by quantifying over individual positions, but also by quan¬ 
tifying over sets of positions, denoted by upper-case variable letters, and employing an 
additional relation symbol x £ X between positions (first-order variables) and sets of 
positions (second-order variables). 

For example, consider the monadic second order formula ip: 

3x3y3X(Q a x A Qby Ax £ X Ay £ X Aip\ A P 2 ), 

where p\ is 

->3z(x = z + 1 A z £ X) A -i3 z(z = y + 1 A z € X), 

and ip 2 is 

Vz{z eX-)(pzV 3u(u £ X A u = z + 1)). 

The formula <p is a sentence; that is, it has no free variables. Thus p defines a language L, p 
over A = {a, b }, namely the set of all words in which the formula is true. The sentence 
asserts the existence of positions x and y with letters a and b respectively, and of a set X 
of positions that contains both x and y. that contains the successor of each of its elements 
with the exception of y. and that contains no elements less than x. Thus L v is the regular 
language A*aA*bA*. 

This example is an instance of the following important theorem, due to J. R. Biichi [16] 
(see [29, 49]). 

Theorem 3.1. A language L C A* is regular if and only if L = L v for some sentence p 
of monadic second-order logic. 

We obtain subclasses of regular languages by restricting these second-order formulas 
in various ways. One obvious such restriction is to study first-order formulas: those 
formulas that use no second-order quantification. We denote this logic, as well as the 
family of regular languages that can be defined in it, by FO [+1]. More generally, consider 
any k- ary relation a on the set of positions in a word that does not depend on the letters 
that appear in the word. Suppose further that a(x\, ..., Xk) is definable by a formula 
of monadic second-order logic. Then we obtain a subclass of the regular languages by 
considering those languages definable by first-order sentences in which a is allowed as 
an atomic formula. We denote this class FO[a], and similarly write F[a\ , 0 - 2 ,...] when 
there are several such predicates. For example, the relation x < y is definable in monadic 
second-order logic, by a formula much like the one used above to define the language 
L = A*aA*bA*. Thus we obtain the logic and the language class FO[<]. Of course, L is 
definable in this logic, by the very simple sentence 

3x3y(Q a x A Q b y Ax < y). 

We can extend the expressive power further, by adjoining, for k> 1, a binary predicate 
=k that says two positions are equivalent modulo k. These predicates, too, are defin- 
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able in monadic second-order logic, and thus we obtain language classes FO[<, =&]. We 
can further restrict these families by bounding the quantifier depth, or the alternation of 
existential and universal quantifiers, or the number of distinct variable symbols. 

We are interested in understanding the expressive power of these logics, and determin¬ 
ing exactly what languages can be defined in them. The critical insight is that, essentially, 
(nearly) all these language classes are varieties. In some instances we obtain ordered va¬ 
rieties, in others C-varieties for a class C of morphisms, but in all cases we obtain families 
that, at least in principle, admit a characterizations in terms of the syntactic monoids and 
morphisms of the languages they contain. 


3.1 Model-theoretic games 

To see why this is so, we first describe an important tool for studying the expressive power 
of logics for words. Consider a first-order logic FOfcti,..., a m \. Look at a pair of words 
w, w' £ A* and suppose that on each word we have placed k ‘pebbles’ labeled Xt ,.... x k 
for iu. and x[,..., x' k for w'. Each pebble is placed on a single position in its word, but 
two different pebbles can be on the same position. We denote the resulting pebbled words 
by u = (w, xi,..., Xk) and u' = (w, x[,..., x' k ). 

We will now describe a game Q r (u,u',ai,...,ak) played on these two pebbled 
words. (This is called an Ehrenfeucht Fraisse game.) The subscript r denotes the number 
of rounds of the game. There are two players, traditionally called Spoiler, who plays first, 
and Duplicator who plays second. We define the rules of the game by induction on the 
number of rounds. In the 0-round game, the winner is already determined: If there is a 
relation a = on of arity p, and pebbles x^,..., Xi p , x' ir ,..., x' ip , such that 

C%(Xi i, . . . , Xi p ) 

holds, and 

a ( a 'i 1 ) ■ • ■ > x'ip) 

does not, or vice-versa, then Spoiler wins the game. If there are pebbles Xi and x\ such 
that the letter in position x t of w is different from the letter in position x\ of «■', then 
Spoiler also wins the game. Otherwise, Duplicator wins. The idea is that Spoiler wins if 
the two pebbled words are different, and the difference must be witnessed by the atomic 
formulas applied to the pebbled positions. 

Now let r > 0. In the r-round game Q r (u, u',a i,..., a m ). Spoiler makes a play by 
placing a new pebble x k +i in u or x' k+1 in v!. If Spoiler played in u then Duplicator 
must respond with x' k+1 in u'. Otherwise Duplicator responds with x k +i in u. The re¬ 
sult is two new pebbled words v,v'. Spoiler and Duplicator proceed to play the game 
Q r -i(v, v',ai,..., a m ). Whoever wins this (r — l)-round game is the winner of the 
r-round game. 

Ordinary words may be considered as special instances of pebbled words and thus 
we can consider the games Q r (w, w', aq,..., a m ), where w, w' £ A*. The fundamental 
property of such games is given by the following theorem. 

Theorem 3.2. Let w,w' £ A*, r ^ 0. The words w and w' satisfy the same sentences 
in FO[ai,..., a m \ of quantifier depth r or less if and only if Duplicator has a winning 
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strategy in G r (w, w', ai,..., a m ). 

See, for example, [29, 49], 

Here is an example: Consider the two words w = aab and w' = aaab. Spoiler has a 
winning strategy if G 2 {w, w', <): First play pebble x\ on the second a of w'. If Duplicator 
replies on the first a of w, Spoiler will play X 2 on the first a of w'. If Duplicator instead 
replies on the second a of w, then Spoiler plays £2 on the third a of w'. In either case. 
Duplicator has nowhere to play x' 2 in w and win the game. By Theorem 3.2, there must be 
some sentence of quantifer depth 2 that distinguishes the two words. Indeed, w' satisfies 

3 x(Q a x A 3y(Q a y A x < y) A 3 y(Q a y A y < x)), 

while w does not. On the other hand. Duplicator has a winning strategy in the two-round 

game in aaaab, aaab. 

What does this have to do with varieties? We will use games to show that logically- 
defined language classes satisfy the closure properties that define varieties. Look, for 
example, at the family of languages defined by FO[<] sentences of quantifier depth no 
more than d, where d f 0. We will denote both this language family and the underlying 
logic by FO d [<], 

Theorem 3.3. FOd[<] is a variety of languages. 

Proof. Since we have to discuss languages over different alphabets, let us denote by 
H*FO d [<] the languages over A* that belong to this family. Obviously A* FO,/[<] is 
closed under boolean operations, so we must verify closure under quotients and inverse 
images of morphisms. Let us write w ~d.A w' to mean that w, w' £ A* satisfy all the 
same sentences of FO<j[<], Then ~</ is an equivalence relation of finite index on A*, and 
every language of A* FOd[<] is a union of ^.A-dasses. We claim that if w ~d,A w' and 
a £ A, then both aw ~d,A aw/, and wa ~d,A w'a , and that further, if ip: A* —> B* is a 
morphism, then tp(w) ~d,B 

To see that this claim implies the result, suppose L £ d*FOd[<] but a~ l L ^ 
A* FOd[<]. Then there exist w, w' G A* with w G a~ l L, w' ^ a~ l L, and w ~d,A w'. 
But then wa G L, w'a £ L , and wa ~d,A w'a, contradicting L G A*FOd[<], By 
the same reasoning we deduce closure under right quotients and under inverse images of 
morphisms. 

To prove the claim, note that by Theorem 3.2, w ~d,A w' if and only if Duplicator has 
a winning strategy in Gd{w, w', <). So we must show that such a winning strategy implies 
the existence of winning strategies for Duplicator in Qd{aw, aw', <), Gd(wa,w'a,<), 
and Gd{y>{w),y>{w'),<). For Gd{wa,w'a,<), the strategy is this: Whenever Spoiler 
plays on the last letter of either wa or w'a, Duplicator responds by playing on the last 
letter of the other word; otherwise Duplicator responds according to the winning strategy 
in (w, w'). The reasoning is identical for Gd{ciw, aw', <). For Qd{p{w), p{w'), <), sup¬ 
pose w = a\ ■ ■ ■ a r , w' = a\ ■ ■ ■ a' s , and let v l = <p(ai), v\ = tp(a'f). Duplicator’s strategy 
is to keep track of a separate game in w, w' to calculate the responses in <p(w), <p(w'). If 
Spoiler plays on the j th symbol of v t , then Duplicator calculates the response, according 
to the original strategy, to a move by Spoiler on a,. Let us say this response is on a' k . 
Observe that oq — a' k , and thus 1 \ = v' k , so Duplicator can reply on the j th symbol of v' k . 
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In other words. Duplicator pulls the Spoiler’s plays back to (w, w'). applies the original 
winning strategy, and pushes the result forward to {<p{w), ip(w')). It is easy to see that 
this strategy wins for Duplicator. □ 

This same reasoning can be adapted to a large number of different situations. Con¬ 
sider, for example, the logics FOd[+l]- The strategy-copying argument no longer works 
to give Duplicator a winning strategy in Qd((p(w),(p(w'),+ 1), because ip may map a 
letter to the empty word, and thus we might end up with two pebbles on adjacent posi¬ 
tions in (p(w), but find the corresponding pebbles on non-adjacent positions of <p{w'). But 
the argument does work for non-erasing morphisms, and thus each !'(.),/[+1], as well as 
the union FO[+l], is a C ne -variety. Similarly, suppose we augment the logic FO[<] by 
adjoining the predicate x = q y for equivalence modulo q. We now find that the strategy¬ 
copying argument works as long as all p{a) for a £ A have the same length m, as i = q j 
implies mi = q mj. Thus each FOd[<, = q ] is a C; m -variety of languages. 

This reasoning is amenable to further adaptations, by altering the rules of the games: 
We obtain a game characterization of languages defined by formulas that use no more 
than p distinct variables by allowing only p pebbles, regardless of the number of rounds. 
Once all the pebbles have been placed, the Spoiler may pick up a pebble and move it to 
a new position; the Duplicator must pick up the corresponding pebble and move it in the 
same direction. We obtain a game characterization of the languages defined by boolean 
combinations of E fc sentences, with quantifier block size bounded by d. by considering 
fc-round games in which each player is permitted to place d pebbles at a time. We can turn 
this into a game characterization of the languages defined by S^-sentences themselves by 
requiring Spoiler to play in w in the first round, in w' in the second round, etc. Duplicator 
then has a winning strategy in the game in w, w' if and only if every E/.-sentence, with 
quantifier block size no more than d, that w satisfies is also satisfied by w'. We can use 
this to conclude that £*[<] is an ordered variety of languages. In all instances, we find 
that some variant of Eilenberg’s Theorem applies, and extract the same conclusion: A 
logical characterization of the language class implies the existence of an algebraic char¬ 
acterization. 

Care must be taken not to extrapolate this too far. For example, the strategy-copying 
argument fails in the case of £i[+l]: Let w = abab , w' = baba. Then w, w' satisfy the 
same £i [+1]-sentences of block size 2, but wa and w'a do not, since w'a contains two 
consecutive occurrences of a. 


3.2 Explicit characterization of logically defined classes 

While the foregoing arguments tell us that logically defined language classes form vari¬ 
eties, they do not provide explicit algebraic characterizations. There are, in fact, a number 
of different methods for connecting the structure of defining sentences to algebraic prop¬ 
erties, and many results giving explicit characterizations of the language varieties defined 
by various logics. (See, for instance Straubing [49].) Here we give just a taste of these 
techniques and results with what is perhaps the most famous, and certainly the first, re¬ 
sult in this area, the theorem of McNaughton and Papert [30] giving the equivalence of 
first-order logic and aperiodic monoids: 
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Theorem 3.4. A language L belongs to FO[<] if and only //'Synt(L) is aperiodic. 

We will only prove one direction of this theorem, namely that first-order definability 
implies aperiodicity. We claim that if u £ A*, then u 2 1 ~d,A u 2 . This is proved by 
induction on d. For d = 0, there is nothing to prove, since all words are equivalent modulo 
~o,A ■ Suppose then that d > 0. We will show that Duplicator has a winning strategy in 
Qd (u 2d ~ 1 , u 2d , <). Suppose Spoiler plays x\ in if d ~ 1 . 

u 2 _1 = u r vav'u s , 

where the pebble is played on the position indicated by the letter a,u = van', and r + s = 
2 d — 2. It follows that either r f 2 d ~ 1 - 1 or s f 2 d ~ 1 — 1. Suppose the former (the 
proof is the same in either case). Then we can write 

u 2 = u r+1 vav'u s . 

Duplicator places the pebble x\ on the indicated a. Now play proceeds as follows: By the 
inductive hypothesis, Duplicator has a winning strategy in Qd-\{u 2 _1 , u 2 ). Thus, 

by the argument given in the proof of Theorem 3.3, Duplicator has a winning strategy in 
Qd-i{u r v, u r+1 v). Duplicator will follow this strategy whenever Spoiler plays to the left 
of x\ or x[, and simply copy Spoiler’s move in av'u s whenever the play is at or to the 
right of xi or x' 1 . This proves the claim. It follows that if L is first-order definable, then 
Synt(L) satisfies the x m = x m+1 for sufficiently large to, and is thus aperiodic. 

We omit the proof of the converse, that if Synt(L) is aperiodic, then L is in FO[<]. 
Most of the published proofs of this theorem rely on some decomposition theory for finite 
semigroups, either the Rrohn-Rhodes decomposition, or the ideal structure of semigroups. 
Most proofs also show first that every language recognized by an aperiodic monoid is a 
star-free language. We will define star-free languages in Section 4.2, and show that they 
are equivalent to first-order definable languages. Pin [32] gives a relatively streamlined 
proof using the ideal decomposition theory. Straubing [49] uses the Rrohn-Rhodes de¬ 
composition to obtain a first-order sentence directly. Wilke [55] gives a proof that is 
remarkable for its absence of hard semigroup theory, and that produces a formula of tem¬ 
poral logic directly from an automaton with an aperiodic transition monoid. □ 

We can use Theorem 3.4 to deduce a claim we made earlier, giving an explicit char¬ 
acterization of the C/ m -pseudovariety QA: 

Theorem 3.5. L belongs to FO[<,= m ]/or some m > 1 if and only if the syntactic 
morphism of L is in QA. 

We merely sketch the argument: Suppose u £ A + with |u| divisible by to. Let d > 0. 
Then by precisely the same argument as we gave in the proof of Theorem 3.4, Duplicator 
has a winning strategy in Gd(u r , u r+1 , <, = m ) as long as r is sufficiently large compared 
to d. This is enough to show that if L is definable by a sentence of FO[<, = m ], then the 
stable semigroup of tjl is aperiodic. For the converse, we consider a language L with J]r 
in QA. Let rjLiA*) be the stable semigroup. If we treat B = A f as a finite alphabet, we 
can use Theorem 3.4 to obtain a first-order sentence, with respect to B, defining the sets 
of words of length divisible by t. that are recognized by tjl , and then translate this to a 
first-order sentence over A by means of the predicate = t . 
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Other logical formalisms By and large, we have confined our discussion of logic to 
the use of first-order quantification. But there are other formalisms studied in the liter¬ 
ature, which also give rise to varieties. We mention in passing two of these: Formulas 
with modular quantifiers, which were introduced by Straubing, Therien and Thomas [50] 
and studied extensively in [49], and temporal formulas, which play an important role in 
computer-aided verification. An algebraic treatment of temporal logic, and its connection 
to varieties of languages, is due to Therien and Wilke [53, 54] and Wilke [55, 56], 


4 Operations on classes of languages 

The idea developed in this section is that certain operations on classes of languages trans¬ 
late to operations on the corresponding sets of profinite identities, or on the corresponding 
classes of syntactic objects (syntactic monoids or semigroups, ordered or not, etc). This 
translation, when it can be made explicit, may provide decomposition results, or member¬ 
ship decision results for complex classes of languages. 


4.1 Boolean operations 

If for each i £ /, Vi is a class of regular languages, the intersection VV = fji£/ Vj is the 
class given by A* W = fj,c/ -4* V, for each alphabet A. The different classes of families 
of languages considered so far (lattices or boolean algebras of languages of some fixed 
A*, positive C-varieties) are easily seen to be closed under (arbitrary) intersection. 

The following statement essentially follows from the definition of the satisfaction of 
profinite equations. 

Proposition 4.1. Let I be a set and for each i € I, let Ei be a set of profinite equations 
on an alphabet A. Then fj ig/ £(Ei) — £(Uie/ Ef). 

In particular, if for each i € I Vi is a class of regular languages that is C-defined by a 
set of profinite (ordered) C-identities Ei, then f] ig/ Vi is C-defined by U,e/ Ei. 

The fact that an arbitrary intersection of lattices of regular languages (resp. (positive) 
C-varieties) is again a lattice of regular languages (resp. a (positive) C-variety) has the 
following consequence: for each set V of regular languages in A* (resp. every class V of 
regular languages) there exists a least lattice (resp. a least (positive) C-variety) containing 
it, which is said to be generated by V (resp. V). 

The union of two lattices of languages in A* is not a lattice in general. The relevant 
operation is the join: the join of two lattices of regular languages in A* (resp. classes of 
regular languages) is defined to be the lattice generated by their union. 

Describing the profinite equations or identities defining a join is difficult. In fact, 
Albert, Baldinger and Rhodes exhibit [1] a finite set E of computable profinite identities, 
such that the join of the pseudovariety [E] with the pseudovariety Com = \xy = yx\ of 
commutative monoids, is not decidable (see also [7]). 

Some joins were computed early, based on the structural theory of monoids. This is 
the case for instance of Ji V G, which is characterized as the class of finite monoids which 
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are unions of groups and in which idempotents commute (see [25]). This translates as 

Ji V G = [:r“ +1 = x, x UJ y“ = y u *“]. 

Other joins resisted computation until the advent of profinite methods, such as the joins 
R V L (Almeida and Azevedo [4]) and G V Com (Almeida [2]). The case of J V G 
is interesting, since this join is decidable but is not defined by a finite set of profinite 
identities (Almeida, Azevedo and Zeitoun [5], Steinberg [45, 46], Trotter and Volkov 
[58]). 

Example 4.1. The following simple examples will be useful in the sequel. Let I = 
[x = y\ be the trivial pseudovariety of monoids (which consists only of the 1-element 
monoid). Let also K and D be the pseudovariety of semigroups K = {x^y = a;“] and 
D = \yx u = x u \. The elements of K are the finite semigroups in which idempotents act 
as zeroes on the left. Dually, in the semigroups of D, idempotents act like zeroes on the 
right. If V is any pseudovariety of monoids, we let LV be the class of finite semigroups 
S such that eSe £ V for each idempotent e of S. It is easily verified that LV is a 
pseudovariety of semigroups, and that it is decidable if and only if LV is. 

It is easily verified that the semigroups that are both in K and in D are exactly the semi¬ 
groups with a single idempotent, which is a zero (these semigroups are called nilpotent). 
Interestingly, the join K V D is equal to LI = \x u yx u = a;"]. 

4.2 Closure operations and Mal’cev products 

An early closure result is Schiitzenberger’s theorem on star-free languages. The set of 
star-free languages on alphabet A is the least boolean algebra containing the letters (and 
the empty set), which is closed under concatenation. For instance, aA* is star-free, since 
it is equal to a0 c . A non-trivial question is that of decidability: given a regular language 
L, can we decide whether it is star-free? As it turns out, ( ab)* is star-free (its complement 
is the set of all words with two consecutive a’s or two consecutive V s, or that start with b 
or end with a) but (aa)* is not... 

The solution to this problem was given by Schiitzenberger [41] with the following 
theorem. 

Theorem 4.2. The class of star-free languages forms a variety of languages, correspond¬ 
ing to the pseudovariety Ap of aperiodic monoids. In particular, this class is decidable. 

In view of Theorem 3.4, this is equivalent to the following statement. 

Theorem 4.3. A language is star-free if and only if it is FO[<]-definable. 

Proof. We prove Theorem 4.3 using game-theoretic methods, as in Section 3. Let us first 
show that a FO[<] -definable language is star-free. It is sufficient to show, by induction 
on k, that for all w £ A* and k ^ 0, [u>]fc is star-free. The case k = 0 is trivial, since 
[w]o = A* for all w £ A*. To prove the general case, we will establish the equality 

Mfc+i = P) [x] k a[x'] k \ (J [y]kb[y']k, 
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where the intersection is over all factorizations w = xax' with x, x' £ A* and a € A, 
and the union is over all triples ([y]k,b, [y'\k)-> where b £ A and w $. [y\kb[y']k- By 
induction, the -classes are star-free languages, so the equality above implies that the 
~fc+i-classes are star-free as well. 

To prove the equality, note that the inclusion from left to right is trivial, so we need 
only show that if w' £ A* is in the set on the right-hand side, then w ~k+i w'. So we will 
show that Duplicator has a winning strategy in the (k + l)-round game in the two words. 
Observe that inclusion of w' in the right-hand side means that w. w' have precisely the 
same set of factorizations with respect to ~£, in the sense that for every factorization xax' 
of one word, with a £ A, there exists a corresponding factorization yay' of the other word 
with x x' ,y ~k y'. Thus if Spoiler plays on a position in one of the words, inducing a 
factorization xax' of the word, Duplicator can play on the corresponding position of the 
other. Duplicator can now correctly reply in the remaining k rounds of the game by using 
his winning strategy in the games in {x, y) and (x', y'). 

Conversely, let us show that every star-free language is FO[<]-definable. In view of 
the definition of star-free languages, we need to show, first, that A* and every language 
of the form {a} (a £ A) is FO[<]-definable; and second that if K and L are FOIn¬ 
definable, then so are the boolean combinations of K and L, and so is KL. The only 
non-trivial point concerns the concatenation product, and the problem easily reduces to 
showing that KaL (a £ A) is FO[<]-definable. 

Let us assume that K and L are defined by formulas of quantifier-depth k. Let w £ 
KaL , say, w = uav with u £ K and v £ L. We want to show that if w ~fc+i w' — that 
is. Duplicator has a winning strategy forQk+\{w,w') —, then w' £ KaL. Let Spoiler 
put a pebble on the letter a in w witnessing the factorization w = uav, then Duplicator’s 
strategy has her put a pebble on a letter a in w', determining a factorization w' = u'av'. 
We claim that Duplicator wins the fc-round game in u and u'\ indeed, such a game can be 
seen as the 2nd ,(k + l)-st moves in a game in w = uav and w' = u'av'. Therefore 
u u' and hence u' £ K. Similarly v' £ L: thus w' £ KaL. □ 

A natural extension of the question answered by Schiitzenberger’s theorem is the fol¬ 
lowing: can we characterize the varieties of languages which are closed under concatena¬ 
tion product? and if V is a variety of languages, can we describe the least variety contain¬ 
ing V and closed under concatenation product? Both problems were solved by Straubing 
[47]. In order to state his result, we need to introduce an operation on pseudovarieties. 

Let V be a pseudovariety of monoids and let W be a pseudovariety of semigroups 
(resp. ordered semigroups). We consider the class of all finite monoids (resp. ordered 
monoids) M for which there exists a morphism (un-ordered) ip: M —► N such that N £ 
V and ip -1 (e) £ W for each idempotent element e of N. This class is not a pseudovariety 
in general, but it is elementary to verify that the quotients (resp. ordered quotients) of its 
elements form a pseudovariety of monoids (resp. ordered monoids), called the Mal’cev 
product of V by W, and denoted W © V. 

Theorem 4.4. Let V be a variety of languages and let V be the corresponding pseudova¬ 
riety of monoids. If TV is the least variety of languages containing V and closed under 
concatenation product, then the corresponding pseudovariety of monoids is Ap @ V. 

Schiitzenberger’s theorem above is the particular case of Theorem 4.4 when V is the 
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trivial variety of languages. 

Interestingly, the Mal’cev product is also useful to characterize the closure of a variety 
of languages under other types of products. For technical reasons, the definition of these 
products involves intermediate, marker letters: If K and L are languages in A*, and if 
a £ A, we say that the product KaL is deterministic if each word u £ KaL has a 
unique prefix in Ka. Co-deterministic products are defined dually: the product KaL is 
co-deterministic if each word u £ KaL has a unique suffix in aL. Another important 
modality of product is the following: a product L^aiLi ■ ■ ■ at Lk is unambiguous if every 
word u in this language admits a unique decomposition in the form u = UQa\U\ • ■ ■ akUk 
with each Ui £ Li. Deterministic and co-deterministic products are particular cases of 
unambiguous products. 

It is natural to extend these operations to classes of languages. Given a class of lan¬ 
guages V, we denote by Det V the class of languages such that, for each alphabet A, 
A* Det V is the set of all boolean combinations of languages of A*V and of determin¬ 
istic products of these languages. Det V is called the deterministic closure of V. The 
co-deterministic closure coDet V and the unambiguous closure UPol V are defined simi¬ 
larly. Schiitzenberger [42, 32] characterized algebraically these operations for varieties of 
languages. 

Theorem 4.5. Let V be a variety of languages and let V be the corresponding pseudova¬ 
riety of monoids. Then Det V, coDet V and UPol V are varieties of languages, and the 
the corresponding pseudovarieties of monoids are K@V, D@V and LI@V, respectively. 

Example 4.2. Consider the variety of languages J\ , described in Sections 1.1 and 2.6.1: 
for each alphabet A, A* J\ is the boolean algebra generated by the languages of the form 
B*, with B C A. It is elementary to verify that A* Det J\ is the boolean algebra gen¬ 
erated by the products of the form AJaiA* •• • auA k , such that for each 0 < i ^ k, 
at A;_i. Theorem 4.5 tells us that Det J\ forms a variety of languages, and that the 
corresponding pseudovariety of monoids is K © Ji- 

Semigroup theory helps us characterize this pseudovariety. K © Ji is the class R 
of all so-called '//.-trivial finite monoids, that is, the monoids M in which principal right 
ideals have a single generator: sM = tM implies s = t. In addition, one can show that 
R = [(try)" x = (xy) u \. This induces immediately the decidability of Det J\. 

A dual result characterizes D © Ji, the pseudo variety associated with coDet J\ , as 
the class L of /[-trivial finite monoids. It is interesting to note that RflL = J. The 
variety of piecewise testable languages discussed in Section 1.2 is therefore the class of 
languages that can be described simultaneously as boolean combinations of deterministic 
and of co-deterministic products of the form AqOiAJ • • • akA* k with each A, a subset of 
A. 

Similarly, Theorem 4.5 shows that the pseudovariety of monoids corresponding to 
UPol J\ is is L\ © Ji. Again, one can show that this pseudovariety is the class of finite 
monoids in which every regular element is idempotent, usually denoted by DA, and equal 
to [ (xyz)^z(xyz) u = ( xyz )“]. It follows, here too, that UPol J\ is decidable. Let us 
note in addition that it coincides with the class of languages that can be defined by FO[<] 
sentences that use at most two variable symbols. (See [52].) 

The following result is of the same nature as Theorems 4.4 and 4.5 but it involves a 
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positive variety of languages, and the corresponding pseudovariety of ordered monoids. 
If £ is a set of regular languages in A*, we denote by Pol£ (the polynomial closure of 
£), the lattice generated by the languages of the form L$a\ L \ ■ ■ ■ ap : L k , with h, £ L and 
di £ A for each i. If V is a class of regular languages, then Pol V is the class such that, 
for each alphabet A, A* Pol V = Pol(A*V). Then the following result holds, see [36]. 

Theorem 4.6. Let V be a variety of languages. Then Pol V is a positive variety of 
languages, and the the corresponding pseudovariety of ordered monoids is ^ 

x“\@V. 

In general, the results reported above do not provide explicit decision algorithms, even 
if V is decidable (see [7]). However, the structural theory of semigroups yields some such 
results. In particular, we can use a result by Krohn, Rhodes and Tilson [28] to show 
that if V is decidable, then so are Det V, coDet V and UPol V (generalizing the specific 
instances discussed in Example 4.2). 

It is not known whether Ap© V is decidable whenever V is. A positive solution to this 
problem would imply a positive solution to an open instance of the complexity problem, 
which we discuss below in 4.3. 

Topological methods also [36] provide sets of profinite identities describing Mal’cev 
products. In the cases of interest for us, it yields the following statement. 

Proposition 4.7. Let V be a variety of languages. Then the least variety containing V 
and closed under concatenation is defined by the set of profinite identities of the form 
x UJ+1 = x u , where x £ X* and V satisfies x = x 1 . 

Similar statements hold for Det V ( respectively, coDet V, UPol V and Pol V), replac¬ 
ing the profinite identity x u+1 = x w by x^y = x w (respectively, yx w = x u , x u yx w = x “ 
and x u yx u x u ), where x,y £ X* and V satisfies x = x 2 = y. 

These results were extended to C-varieties, and in the case of PolV, to lattices of 
regular languages closed under quotients [34, 15]. In practice, the resulting sets of profi¬ 
nite identities are infinite and incomputable. However, in a number of situations, one can 
extract from these sets more manageable, yet sufficient subsets, yielding decision algo¬ 
rithms. 

Example 4.3. Branco and Pin [15] use Proposition 4.7 —applied to the lattice of slender 
languages (see Section 2.6.8) — to prove the decidability of the lattice generated by the 
languages of the form L$a\ L \ ■ ■ ■ ap-Lp. where the L, are either A* or of the form u* for 
some u £ A*. 


4.3 Product operations and semidirect products 

We now consider products of the form La A*, where £ is a language and a £ A: La A* is 
the language of all words with a prefix in La. Given a monoid M accepting L, one can 
construct a monoid accepting LaA* using the operation of semidirect product. 

In general, let S and T be monoids. A left action of T on S is a mapping A: T x S —> 
S, written (f, s) i ->-1 ■ s, such that for each t, the map X t : s i—»• t ■ s is an endomorphism of 
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S, and such that the map t H> A t is a morphism from T to the monoid of endomorphisms 
of S. Once such an action A is given, the semidirect product S *\T (we usually write 
S *T) is the monoid of all pairs (s, t) £ S x T, with product 

(s,t)(s',t') = (s A(t, s’), tt'). 


Lemma 4.8. If ip: A* —^ T accepts the language L, then Up * T accepts LaA*. 

Proof. We consider the action A of T on Up given by A (t, {s x ) x ^t) = ( s x ) x& t, with 
s' x = s xt- Let then ip: A* —> Up * T be given by , for each b £ A, 


V>0) = ((si b) )x£T, <£>(&)) with 




0 if x £ <p(L) and b = a, 
1 otherwise. 


Using the definition of the product in Up * T, we find that 

ip{ai •••<!„) = ((r x ) X £ T , p{ai ■ ■ ■ a n )) with 


— o(“l) 

x t ’x<p(a 1 ) 


,(»«) 

s xtp(ai’"a n —i) 


J0 if for some 1 ^ i ^ n, xip(ai ■ ■ ■ at- 1) G <p(L) and ai = a, 
[ 1 otherwise. 

In particular, we observe that a\ ■ • • a n G LaA* if and only if n = 0. 


□ 


Remark 4.9. Observe that the construction of the semidirect product Up' * T given above 
does not use anything special about U\, and thus can be applied to any pair of monoids U 
and T. This is called the wreath product U oT. The wreath product is closely related to 
the semidirect product, in the sense that first, it is, of course, a semidirect product with T 
of a member of the pseudovariety generated by U. and, second, every semidirect product 
U *T embeds in U o T. The wreath product, in a sense that can be made precise, captures 
the notion of serial composition of automata [18]. As a consequence it is frequently used, 
exactly as in the proof of Lemma 4.8 above to prove decomposition results. 


The operation of semidirect product is naturally extended to pseudovarieties: if V and 
W are pseudovarieties, we let V * W be the pseudovariety generated by the semidirect 
products S *T with S £ Y and T £ W. Then we have the following theorem. 

Theorem 4.10. Let V be a variety of languages and for each alphabet A, let A* W bethe 
boolean algebra generated by the languages of A*V and the languages of the form LaA* 
with L £ A* V. Then the class of languages W is a variety and the corresponding pseu¬ 
dovariety of monoids is J\ * V. 

Proof Since U\ £ Ji, Lemma 4.8 shows that every language in A*W is accepted by 
a monoid in Ji * V. The proof of the converse is a particular case of the more general 
wreath product principle (Straubing [48]). Let ip be a morphism ip: A* —>• S *T and for 
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each a £ A, let <p(a ) = ( s a ,t a ). Let ip: A* —>• T be the morphism given by ip(a) = t a . 
Let also B = T x A and let a: A* —> B* be the map 

er(ai • • • a n ) = (l,ai) (ip(a 1 ),a 2 ) ■ ■ ■ (ip{ai • • -a„_i),a n ). 

Note that <r is a so-called sequential function [11, 39], not a morphism. We observe 
however that, if x '■ B* —»• S is the morphism given by x(f, a) = t ■ s a , then 

<p(ai ■ ■ ■ a n ) = (xcr(ai ■' ■ a n ), ip(ai ■ ■ ■ a„)). 

It follows that if (s, t) € S *T, then t) = ip^{t) D cr _1 (% _1 (s)). If T £ V, then 

V ,_1 (f) £ A*V. And if S £ Ji, then \' _1 (s) is a language in B* and hence a boolean 
combination of languages of the form B*(t,a)B* (( t,a ) £ B ). Then cr _1 (x _1 (s)) is 
a boolean combination of languages of the form a~ 1 {B*(t, a)B*). Now <r(ai ■ ■ ■ a n ) £ 
B*(t,a)B* if and only if, for some 1 ^ i ^ n, we have (t, a) = {ipifii ■ ■ -aj_i),aj), 
that is, if and only if a 1 ■ ■ - a n £ ip~ 1 (t)aA*. In particular, x -1 ( s ) an d are in 

A* W, and so is any language accepted by <p. □ 

Remark 4.11. The semidirect product is a powerful tool to decompose pseudovarieties. 
The operation V * W is associative on pseudovarieties and Krohn and Rhodes [27] estab¬ 
lished that every finite monoid M sits in an iterated product Xi * • • • * X& where each X, is 
either G or Ap (and the G and Ap factors alternate since G * G = G and Ap * Ap = Ap). 
This gives rise to a famous open problem, the so-called complexity problem: given M, 
can we compute the minimum number of G factors in a product of Ap and G containing 
Ml 

An analogous operation, the 2-sided semidirect product, can be used to handle the 
products of the form KaL ( K. L C A*)- This time, we need to consider not only a left 
action of T on S (as for the semidirect product), but also a right action of T on S, a map 
p: S x T S, written (s, f) i —> s ■ t, with the dual properties of a left action (p t : s >-> s-t 
is an endomorphism of S and t i-> pt is a morphism), and such that, for all t. / ' £ T, A* 
and p t ' commute: t ■ (s ■ t') = (t ■ s) ■ t'. Then the 2-sided semidirect product S T 
(written S ** T) is the monoid of all pairs (s, t) £ S x T, with product 

(s,t)(s',t') = ( p{s,t ') A (t,s'), 11'). 

Again, the operation is extended to pseudovarieties, by letting V ** W be the pseudovariety 
generated by the products S **T with S £ V and T £ W. Then the following analogue 
of Theorem 4.10 holds. 

Theorem 4.12. Let V be a variety of languages and for each alphabet A, let A* W is the 
boolean algebra generated by the languages of A*V and the languages of the form KaL 
with K , L £ A* V. Then the class W is a variety and the corresponding pseudovariety of 
monoids is Ji ** V. 

Proof The first step of the proof consists in verifying that if K and L are accepted by 
a monoid in T £ V, then KaL is accepted by Uj x T ** T. (Note that if K and L 
are accepted by monoids T) and T 2 , then they are both accepted by T\ x T 2 , so it is 
no restriction to assume that I\ and L are accepted by the same monoid.) This step is 
performed essentially like in Lemma 4.8, and the details are left to the reader. 
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The second step, to prove that if ip is a morphism <p: A* —S ** T with S £ Ji and 
T £ V, then each ip~ 1 (s, t) is in A* VV. Here too, we use (a 2-sided version of) the wreath 
product principle [59]. For each a £ A, let ip(a) = ( s a ,t a ). Let ip: A* —> T be the 
morphism given by ip (a) = t a , let B = T x A x T and let a: A* —> B* be the map 

cr(ai ■■■a n ) = (1, «i, ip(a 2 ■ ■ ■ a n )) a 2 ,ip(a 3 ■ ■ ■ a n )) ■ ■ ■ (?/>(«i ■ ■ ■ a n -i), a n , 1) 

Then, if \ '■ B* —> S is the morphism given by , a, t') = (t ■ s a ) ■ t', then 
ip(ai ■ ■ ■ a n ) = (xcr(ai • • • a n ), ip{ai ■ ■ ■ a n )). 

We conclude as in the proof of Theorem 4.10. □ 

Remark 4.13. In view of Schutzenberger’s theorem (Theorem 4.2 above), one can use 
this result to show that the least pseudo variety closed under the operation V n- Ji ** V, 
is the pseudovariety Ap of aperiodic monoids. 

Semidirect product decomposition yields very difficult decision problems, such as the 
complexity problem briefly described in Remark 4.11 . Tilson showed that the considera¬ 
tion of certain categories offered a systematic tool to understand semidirect (and 2-sided 
semidirect) product decompositions ([57], see also [49]). Almeida and Weil combined 
this category-theoretical approach with topological methods to provide sets of profinite 
identities describing many instances of semidirect products [6]. As with Mal’cev prod¬ 
ucts, these sets are usually infinite and do not offer immediate solutions to decidability 
problems, see [7], 

For the products discussed in this section, [6] gives the following descriptions. 

Proposition 4.14. Let V be a pseudovariety of monoids. Then J \ *-V is defined by the set 
of profinite identities of the form xy 2 = xy and xyz = xzy for all x,y,z£ X* such that 
V satisfies xy = xz = x. 

Ji ** V is defined by the set of profinite identities of the form xy 2 x' = xyx' and 
xyzx' = xzyx' for all x,y,z,x' £ X* such that V satisfies xy = xz = x and yx' = 
zx' = x'. 

In [6], this result is used to show the decidability of Ji * J and Ji ** J. 

It is interesting also to note that 2-sided semidirect products and category-theoretical 
extensions of the notion of pseudovariety can be used to decompose unambiguous prod¬ 
ucts, that is, to decompose the operation V M- LI © Y, see [35]. 


5 Varieties in other algebraic frameworks 

The fundamental notions explored in this chapter—classes of algebras defined by iden¬ 
tities, properties preserved under products and quotients, etc .—properly belong to the 
domain of universal algebra. We have applied these ideas to finite monoids, ordered finite 
monoids, and stamps, but in fact they are applicable in a much wider variety of settings. 
Here we will briefly discuss some of these extensions. 
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The study of varieties originates in the work of Garrett Birkhoff [12], who showed 
that a family of algebras (defined in a very general sense) is closed under formation of 
subalgebras, quotients and products if and only if it is defined by a set of identities. Such 
families of algebras are called varieties because of a loose analogy with the varieties of 
algebraic geometry defined by sets of polynomial equations. Note that the classes of 
finite monoids that we have discussed are not varieties in this sense because they are 
not, of course, closed under infinite direct products, nor even finite quotients of infinite 
direct products, and consequently they cannot be defined by sets of explicit identities (as 
opposed to profinite identities). 

Efforts to adapt Birkhoff’s Theorem to finite algebras include work of Eilenberg and 
Schiitzenberger [19], and of Baldwin and Berman [8], who both showed that pseudova¬ 
rieties are indeed defined by sets of identities, in the sense that an algebra belongs to a 
pseudovariety if and only if it satisfies all but finitely many identities of the set. A differ¬ 
ent treatment, and the one that we have followed here, based on identities in free profinite 
algebras, was given by Reiterman, who proved the second part of Theorem 2.15 in the 
setting of arbitrary finite algebras [38] (see also Banaschewski [9]). 

The first part of Theorem 2.15, characterizing the language classes corresponding to 
pseudovarieties of finite monoids, is from Eilenberg [18]. A generalization applicable to 
pseudovarieties of single-sorted finite algebras is given by Almeida [3], 

The ordered monoids considered in this chapter are not, strictly speaking, algebras, 
but rather instances of finite L-structures, which are algebras together with a set of rela¬ 
tions compatible with the operations in the algebra. Pin and Weil [36] prove an analogue 
of Reiterman’s Theorem for such structures. In this setting the profinite identities are re¬ 
placed by profinite relational identities. The profinite ordered identities discussed in this 
chapter are a particular instance. 

Variety theories of the kind described here have also been successfully extended to a 
number of many-sorted algebras that arise in the domain of automata theory, and which 
we briefly describe: 

Wilke [60] and Perrin and Pin [31] consider regular languages of infinite words. Here 
the corresponding algebraic objects are two-sorted algebras called uj-semigroups. These 
are pairs (Sf, S w ). where Sf is a semigroup, and where there are additional operations 
Sf x Su, —> S b _. and Sf —>• S w . Here the free object (analogous to the free monoid in 
the case of pseudovarieties of finite monoids) is the pair (A + ,A^) of finite and infinite 
words over A. The three operations correspond to ordinary concatenation of finite words, 
concatenation of a finite word and an infinite word to obtain an infinite word, and taking 
the infinite power of a finite word to obtain an infinite word. 

Esik and Weil [20, 21] describe a theory of varieties for regular languages of ranked 
trees. These are finite trees in which the nodes are labeled by letters of a finite alphabet 
E that is the disjoint union of subalphabets Eo,.... E n , where the label of a node with /;: 
children belongs to E*. In particular, the number of children of any node in such a tree 
is bounded above by n. The corresponding algebraic objects are called finitary preclones. 
These are sequences of finite sets So, Si, .... The operation takes an element / of ,5V, and 
a sequence g = (gi, ■.. ,gk), where gi £ S mi , and yields an element / • g of S m > where 
m = mi + • • ■ + TOfc. The free object is the sequence (EMo, EMi,...), where E Mj. 
consists of k-ary ranked trees', these are ranked trees in which k of the leaves, reading 
in left-to-right order, have been replaced by the variable symbol v±,... ,Vk- In this free 
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preclone, the operation / ■ (gi ,..., ) is that of replacing the k variables in / by the trees 

gi,...,gk to obtain an m-ary ranked tree. 

The theory can be extended as well to regular languages of finite unranked forests, 
in which there is no bound of the degree of branching of the nodes (e.g., Bojanczyk and 
Walukiewicz [14], Bojanczyk, Straubing and Walukiewicz [13]). Here the corresponding 
algebraic objects are called forest algebras. These are pairs (//. V) of monoids where V 
acts on H. The letters H and V stand for ‘horizontal’ and ‘vertical’: The free object is the 
pair (Ha, Va) where Ha consists of forests labeled by letters of A, and Va consists of 
contexts: forests in which the letter at one leaf has been deleted and replaced by a single 
variable. The product in Ha is simply concatenation of forests to obtain larger forests; 
the product in Va is substitution of one context for the variable in another context; and the 
action of Va on II 4 is substitution of a forest for the variable in a context so as to obtain 
a larger forest. 

For further details on this algebraic approach of the theory of regular tree languages, 
we refer the reader to Chapter 22 in this Handbook. 
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